WO2018087545A1 - Object location technique - Google Patents

Object location technique Download PDF

Info

Publication number
WO2018087545A1
WO2018087545A1 PCT/GB2017/053366 GB2017053366W WO2018087545A1 WO 2018087545 A1 WO2018087545 A1 WO 2018087545A1 GB 2017053366 W GB2017053366 W GB 2017053366W WO 2018087545 A1 WO2018087545 A1 WO 2018087545A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
coordinate system
video frame
geographic
map
Prior art date
Application number
PCT/GB2017/053366
Other languages
French (fr)
Inventor
Mahdu KIRAN
Mohamed SEDKY
Original Assignee
Staffordshire University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Staffordshire University filed Critical Staffordshire University
Publication of WO2018087545A1 publication Critical patent/WO2018087545A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C11/00Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
    • G01C11/02Picture taking arrangements specially adapted for photogrammetry or photographic surveying, e.g. controlling overlapping of pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30236Traffic on road, railway or crossing
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • G08B13/19608Tracking movement of a target, e.g. by detecting an object predefined as a target, using target direction and or velocity to predict its new position

Definitions

  • the present disclosure relates to methods, systems and computer readable media for processing video data captured by a camera, in particular to automatically locate the position of a target object.
  • a particular application of the object location techniques of the present disclosure is in video surveillance systems.
  • Video surveillance systems can use one or more cameras to capture video footage of a surveillance site.
  • Object location techniques can be used to automatically determine the location or position of a target object detected in the captured video data. The position of the object can be displayed as an indicator or icon on a dedicated map of the surveillance site.
  • the present disclosure seeks to extend the functionality of known object location techniques.
  • a method comprising: receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera; tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system.
  • the method can convert the position of a target object in captured video data into geographic coordinates in a geographic coordinate system, for example a world geographic coordinate system, to determine the position or location of the target object in the geographic coordinate system.
  • the method may be an automated, computer-implemented method.
  • the method therefore can automatically convert the position of a target object in video data to geographic coordinates in a geographic coordinate system.
  • the method may further comprise using the position of the geographic coordinates to display on a map image an indicator to show the position of the target object on the map image.
  • the map image may be displayed in a map view window.
  • the map image may be a satellite image. The position of the target object in a satellite image can therefore be easily identified by a viewer.
  • Converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system may comprise converting the position of the target object in the video frame coordinate system to a position of the target object in a map coordinate system; and converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.
  • Converting the position of the target object in the video frame coordinate system to a position of the target object in the map coordinate system may comprise applying a homographic transformation to the position of the target object in the video frame coordinate system.
  • the homographic transformation may be determined by a mapping of each of the positions of at least four non-collinear points in the video frame coordinate system to a corresponding position of each of the points in the map coordinate system.
  • Converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system may comprise: selecting at least three reference points in the map coordinate system; identifying, for each of the at least three reference points, corresponding geographic coordinates in the geographic coordinate system; determining a mapping for converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system using the reference points and identified geographic coordinates of the reference points; and using the mapping to convert the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.
  • the method may comprise displaying the video data in a video data viewing window.
  • the method may comprise displaying a visual identifier for the target object in the video data viewing window.
  • the visual identifier is used to clearly distinguish the target object from the background in the video frame and to allow a viewer of the video data viewing window to quickly and clearly locate the target object within the video frame.
  • the method may comprise displaying the geographic coordinates for the target object in a video data viewing window.
  • the method may comprise calculating the distance between the target object and a fixed point in the video frame.
  • the method may comprise displaying the distance in a video data viewing window.
  • the method may comprise comprising tracking a plurality of target objects.
  • the method may comprise calculating the distance between two target objects.
  • the method may comprise displaying the distance in a video data viewing window.
  • the method may comprise calculating the velocity of a target object.
  • the distance and speed may be calculated from the geographic coordinates.
  • the method may comprise displaying the velocity in a video data viewing window.
  • the method may comprise converting the position of the moving object from the video frame coordinate system to tracking coordinates in a tracking device coordinate system for a tracking device; adjusting the tracking device to direct the tracking device towards the moving object.
  • the tracking device may be a light or a loudspeaker or a camera.
  • the tracking device may be a tracking camera and adjusting the tracking device to direct the tracking device towards the moving object comprising adjusting the camera to monitor the moving object.
  • the method may further comprise displaying the view captured by the tracking camera.
  • the view captured by the tracking camera may be displayed in a target object viewing window.
  • the tracking coordinates can be calculated from the geographic coordinates.
  • a system comprising a memory, a processor and at least one camera, the system being configured to perform a method according to an aspect of the present disclosure.
  • a computer readable medium having stored therein computer readable instructions which, when executed by a processor, cause the processor to perform a method according to an aspect of the present disclosure.
  • Figure 1 is an illustration of a screen shot
  • Figure 2 is an illustration of a later screen shot
  • Figure 3 is an illustration of an even later screen shot
  • Figure 4 is a flowchart of a method for determining a geolocation of a target object from video data
  • Figure 5 is an illustration of another screen shot
  • Figure 6 is a schematic block diagram of a computer system
  • Figure 7 is a flowchart of a method for object segmentation
  • Figure 8 is a diagram showing how illumination and the image picked up by a camera are related.
  • Figure 9 is a flowchart of a method for determining geographic coordinates of a target object from a video frame.
  • FIGs 1-3 depict an embodiment of the present disclosure in which a target object is tracked by a surveillance system.
  • the coordinates of the target object are determined in the coordinate system of a video frame received from a fixed camera and are used to calculate the geolocation (latitude and longitude) for the target object.
  • the geolocation of the target object is used to focus a pan-tilt- zoom (PTZ) camera on the target object, the PTZ camera being used to follow the object as it moves.
  • PTZ pan-tilt- zoom
  • FIG. 1 depicts a screen shot of a screen 10 which is displayed to a user.
  • FIG. 1 depicts the screen 10 having a video data viewing window 20, a target object viewing window 30, and a map view window 40.
  • Video data is received from a fixed camera and displayed in the video data viewing window 20.
  • the fixed camera captures video data of a site, which in this example is a car park.
  • a target object 22, which in this example is a car is identified in a video frame of the video frame data. Identification of the target in this embodiment is performed by object segmentation, in which the motion of the car identifies the car as a target object 22.
  • the position of the target object 22 in a coordinate system of the video frame is determined.
  • the coordinate system corresponds to a horizontal distance and a vertical distance from a corner of the frame displayed in the video data viewing window 20.
  • the position of the target object 22 in the video frame coordinate system is mapped to latitude and longitude coordinates for the target object 22.
  • one reference point may be the corner of the building shown in FIG. 1 for which the latitude and longitude
  • Another possible reference point may be the tip of a lamppost such as that shown in the image for which the latitude and longitude coordinates are known.
  • the calculated latitude and longitude coordinates of the target object 22 are used to adjust a pan-tilt-zoom (PTZ) camera so that the PTZ camera focuses on the target object 22.
  • PTZ pan-tilt-zoom
  • the PTZ camera has centred the target object 22 in the target object viewing window 30 and has zoomed to an appropriate level.
  • a map of the car park is depicted, on which the position 42 of the target object 22 is shown.
  • the position 46 of the fixed camera from which the video data is received is also depicted.
  • the position 44 of the PTZ camera from which the images of the target object viewing window 30 are received is also depicted.
  • the video data comprises one or more sequences of frames.
  • the car moves in the car park as the sequence of frames progresses, and the PTZ camera tracks the car as it moves.
  • FIG. 2 depicts a screen shot when a frame later in the sequence of frames than that shown in FIG. 1 is displayed in the video data viewing window 20. That is, the frames in the sequence of frames between the frame depicted in FIG. 1 and the frame depicted in FIG. 2 are not illustrated in the figures for reasons of brevity, but on the actual screen a continuous sequence of frames is shown and the PTZ camera tracks the car as it moves.
  • the target object 22 and associated visual identifier 24 has now moved to a different location in the car park.
  • the new position of the target object 22 in the video frame coordinate system is determined and used to calculate the latitude and longitude coordinates for the target object 22.
  • the latitude and longitude coordinates of the target object 22 are used to direct the PTZ camera to focus on the new location of the target object 22 as shown in the target object viewing window 30.
  • the new location 42 of the target object 22 is also illustrated in the map view window 40.
  • a similar process is followed for all or some of the frames in the sequence between the one shown in FIG. 1 and the one shown in FIG. 2.
  • FIG. 3 depicts a screen shot when a frame later in the sequence of video frames than that shown in FIG. 2 is shown in the video data viewing window 20.
  • a new position of the target object 22 is displayed in the video data viewing window 20 along with a visual identifier for the target object 22.
  • the position of the target object 22 in the video frame coordinate system is determined and the latitude and longitude coordinates for the target object 22 are calculated.
  • the latitude and longitude coordinates for the target object 22 are used to direct the PTZ camera to focus on the newly calculated location of the target object 22 as shown in the target object viewing window 30.
  • the new location 42 of the target object 22 is illustrated in the map view window 40.
  • a continuous sequence of frames is shown between the frames shown in FIGs. 2 and 3 and a similar process is followed for all or some of the frames in the sequence between the one shown in FIG. 2 and the one shown in FIG. 3.
  • FIG. 4 is a flowchart depicting a method for determining a geolocation of a target object from video data. The method is an automated, computer-implemented method.
  • step 405 the process starts as video data is received from a fixed camera.
  • the video data is displayed in a video data viewing window at step 410.
  • a target object is identified in a video frame of the received video data.
  • a target object is typically a moving object within the field of view of the fixed camera.
  • a position of the target object in the video frame coordinate system is determined.
  • a visual identifier for the target object is displayed in the video data viewing window.
  • the displayed visual identifier is used to clearly distinguish the target object from the background in the video frame and to allow a viewer of the video data viewing window to quickly and clearly locate the target object within the video frame.
  • the visual identifier is a rectangle around the target object.
  • step 430 the latitude and longitude coordinates for the target object are calculated. In order to do this, stationary reference points within the video frame are used.
  • These reference points may, for example, include a location of a lamppost or the corner of a building etc.
  • coordinates in the video frame coordinate system are known.
  • latitude and longitude coordinates are known. In this way a mapping exists between the coordinates of each reference point in the video frame coordinate system and their respective latitude and longitude coordinates. Accordingly, by comparing the position of the target object in the video frame coordinate system with the position of each reference point in the video frame coordinate system, it is possible to calculate the latitude and longitude coordinates for the target object. Further details on how this can be done are explained below.
  • the calculated latitude and longitude coordinates for the target object are displayed in the video data viewing window. The latitude and longitude coordinates can be displayed in the vicinity of the target object or anywhere within the video data viewing frame.
  • a distance is calculated between the target object and the fixed camera from which the video data is received.
  • Latitude and longitude coordinates of the fixed video camera are known and so the ground distance between the target object and the fixed camera can be calculated by comparing the coordinates of the target object with the coordinates of the fixed camera.
  • the distance data is displayed in the video data viewing window.
  • the pan-tilt-zoom (PTZ) camera is adjusted to monitor the target object.
  • Information concerning the calculated latitude and longitude coordinates for the target object is used to direct the PTZ camera to the target object.
  • the PTZ camera is configured to zoom to an appropriate level for viewing the target object based on the calculated distance data. For example if the ground distance between the target object and the fixed camera is known and the geolocation of the PTZ camera is known then a distance from the PTZ camera to the target object can be calculated and used to adjust the zoom of the PTZ camera.
  • Frames captured by the PTZ camera are displayed in a target object viewing window.
  • an indicator is used to indicate the position of the target object on a map image in a map view window.
  • the method loops back to step 420 for analysis of the next frame. This can occur if, for example, the target object moves and so has a different location in subsequent frames.
  • step 465 if a next video frame is not to be processed then the process ends at step 465. This can occur if, for example, the target object moves out of the field of view of the fixed camera or if no video frames remain to be processed. In this way a target object is monitored as it moves within the field of view of the fixed camera.
  • FIG. 5 depicts a screen shot 10 according to another embodiment. In this
  • a target object viewing window is not present.
  • Video data is received and shown in the video data viewing window 20 and a moving object 22 is identified within a received video frame of the video data.
  • a visual identifier 24 for the target object 22 is displayed in the video data viewing window 20.
  • Latitude and longitude coordinates are calculated for the target object 22.
  • a map view window 40 the location 42 of the moving object 22 is shown.
  • the computer apparatus comprises a communications adaptor 605, a processor 610 and a memory 615.
  • the computer apparatus also comprises an input device adaptor 620 for communicating with an input device 625.
  • the computer further comprises a display adaptor 630 for operation with a display 635.
  • the processor 610 is configured to receive data including video data, access for memory 615, and to act upon instructions received either from said memory 615 or said communications adaptor 605.
  • communications adaptor 605 is configured to receive data and to send out data.
  • Data received by the processor 610 includes video data captured by a fixed camera 640.
  • Processor 610 is configured to process the video data from the fixed camera 640.
  • Processor 610 is further configured to identify a target object in a video frame of the video data, calculate the latitude and longitude coordinates of the target object and cause this information to be displayed on the display 635.
  • the processor is further configured to adjust a PTZ camera 650 to track said target object based on the calculated latitude and longitude coordinates for the target object. Video data from the PTZ camera can then be displayed on display 635.
  • the computer apparatus may be a distributed system which is distributed across a network or through dedicated local connections.
  • the methods described herein may be implemented by a computer program.
  • the computer program may include computer executable code or instructions arranged to instruct a computer to perform the functions of one or more of the methods described above.
  • the computer program and/or the code or instructions for performing such methods may be provided to an apparatus, such as a computer, on a computer readable medium or computer program product.
  • the computer readable medium could be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, for example for downloading the code over the Internet.
  • the computer readable medium could take the form of a physical computer readable medium such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-R/W or DVD.
  • a physical computer readable medium such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-R/W or DVD.
  • Detection of a moving target object can be performed by any suitable method such as optical flow; temporal differencing; and background modelling, commonly known as background subtraction or a hybrid approach which combines a number of approaches.
  • a target object is detected by object segmentation methods, such as that set out below.
  • the first step of a typical object segmentation algorithm is to receive a video frame at step 710.
  • the video frame is converted to normalised RGB data at step 720.
  • a pixel of the video frame is compared with a corresponding pixel of a background model.
  • a determination is made as to whether the pixel of the video frame differs from the corresponding pixel of the background model by more than a threshold value. If a determination is made that the difference is greater than a threshold value then the pixel of the video frame is categorised as belonging to a target object at step 750. Alternatively, if a threshold value
  • the pixel of the video frame is categorised as belonging to the background at step 760, i.e. not belonging to a potential target object. If further pixels remain to be processed (step 770) then the process loops to step 730. If no further pixels remain to be processed, then post-processing of the pixels categorised as belonging to the target object occurs at step 780. The process terminates at step 790.
  • an assessment of the reflectivity properties of an object covered by the pixel is used as set out below.
  • the reflectivity of an object is a measure of the amount of light reflected by the object, or radiance, relative to the amount of incident light shone on the object, or irradiance, and is indicative of the reflectance or intrinsic brightness of the object.
  • the reflectivity of an object can be used as a signature of the object. Hence it can be used to segment the object with respect to the remainder of an image.
  • the reflectivity of an object is composed of diffuse (Lambertian, body) and specular (surface) components.
  • the output of a camera depends on three factors:
  • the response to light at a given pixel is defined by the triplet of responses given by R, G and B outputs.
  • the R G and B outputs are related to the illuminant, the camera response and the spectral reflectivity by Equation (1):
  • E(X) is the spectral power distribution of the illuminant which can be estimated by the available knowledge of the environment or by reference to a background model
  • S(X) is a spectral reflectivity function characterising the proportion of light on an object that the object reflects.
  • Q G ( ⁇ ) characterises the green camera sensor spectral response characteristics
  • Q B ( ⁇ ) characterises the blue camera sensor spectral response characteristics
  • w d is a parameter for the diffuse reflection 820 component
  • the geometrical parameter for the specular reflection 810 component is given by w s .
  • the spectral reflectivity function S( ⁇ ) is represented as a weighted sum of spectral basis functions by:
  • n 3 and the spectral basis functions are Parkkinen spectral basis functions (J. Parkkinen et al. Characteristic spectra of munsell colors. J. Opt. Soc. Am. A, 6(2):318-322, 1989. 3).
  • the aim now is to calculate the weights of the spectral basis functions, to obtain the spectral reflectivity of the object represented by the pixels of the current image.
  • the calculated weights can then be used in the comparison of the video frame with the background model.
  • Equation (1) The model of Equation (1) is rewritten as:
  • the first basis function of Parkkinen is constant, and so
  • the first term can be merged with the specular component term to give:
  • the RGB image of the video frame is represented by basis function weights as an expression of the reflectivity of the surfaces represented in the current image. That is, the R, G and B values of each pixel in the image of the video frame are used to calculate the basis function weights characterising the spectral reflectivity of any surface that pixel covers.
  • the spectral reflectivity of the surfaces of both objects can be compared by cross-correlation.
  • Equation (12) One way to compare the video frame with the background model is to use Equation (12).
  • the background model is formed by averaging over weights (i.e. taking the mean value of the weights) calculated for a number of input images which represent relatively static background frames.
  • the average weights, along with the weights calculated for the video frame, can be substituted into Equation (12) to define
  • BG represents a pixel of the background model and VF represents a pixel of the received video frame.
  • Each pixel by comparing the calculated value of Equation (15) with threshold values Cmax and Cmin , can then be categorised as part of the background or as part of the foreground (steps 750 and 760). In this way, a foreground mask, FGMask, is defined.
  • a post-processing of the foreground mask can begin (step 780).
  • a dilation step is applied, which performs a dilation operation to expand or thicken the initial foreground mask.
  • an erosion operation is applied to the resultant mask, which reduces the size of an object by eliminating area around the object edges.
  • the erosion operation removes noise and eliminates foreground mask details smaller than a predetermined structure element size (e.g. 2 pixels).
  • a geographic coordinate system is a coordinate system that uses geographic coordinates which enable, in principle, every location on the Earth to be specified by a set of coordinates.
  • a geographic coordinate system can be considered as a universal geolocation coordinate system. Embodiments using latitude and longitude coordinates will be described, although any suitable geographic coordinate system can be used.
  • a location of a target object can be derived from video data by any suitable method.
  • FIG. 9 shows a flowchart of a method for deriving geographic coordinates for a target object identified in video data from a fixed camera according to an embodiment.
  • the geographic coordinates of the target object are calculated by first calculating the coordinates of the target object in a site map, such as may be viewed in a map view window, that covers the field of view of the fixed camera.
  • step 910 four points (pixels) are selected within a video frame of the received video data and their respective positions in a video frame coordinate system are identified.
  • the site map is oriented such that north is directed to the positive y-axis (ordinate) of the site map and east is directed towards the positive x-axis (abscissa) of the site map.
  • a perspective transformation matrix is calculated.
  • a homography transformation matrix is calculated using the perspective transformation matrix. The homography transformation matrix maps the positions of the selected four points within the video frame to the corresponding four points within the site map.
  • the position of an object within a map coordinate system of the site map is therefore related to the position of the object within the video frame coordinate system by the transformation:
  • CamPixelx is the position of the pixel covering the object in the x (horizontal) direction in the video frame
  • CamPixely is the position of the pixel covering the object in the y (vertical) direction
  • MapPixelx is the position of the object on the site map in the east-west (horizontal) direction
  • MapPixely is the position of the object on the site map in the north-south (vertical) direction
  • H represents the homography transformation matrix.
  • Steps 910, 920, 930, 940 and 950 lead to the calculation of a homography transformation matrix. Steps 910, 920, 930, 940 and 950 need only be performed once to provide the homography transformation matrix for a given viewpoint of the fixed camera.
  • the homography transformation matrix, H is applied to a position of a target object identified in a video frame, in the video frame coordinate system.
  • the position of the target object can be determined by the position of a pixel at the centre- of-mass of the target object in the video frame.
  • step 970 three corners of the site map are selected and their latitude and longitude coordinates identified.
  • step 980 the height of the map is calculated. That is, the distance, MapImageH eight, between two vertically aligned corners of the site map is calculated.
  • step 980 the width of the map is calculated. That is, the distance, MapImageWidth, between two horizontally aligned corners of the site map is calculated.
  • Steps 970 and 980 are used to calculate appropriate parameters for converting a position of an object in the map coordinate system to a position of the object in latitiude and longitude coordinates. Steps 970 and 980 need only be performed once to provide the appropriate parameters.
  • the known latitude and longitude coordinates of the four corners of the site map and the height of the map and the width of the map are used to calculate the latitude and longitude coordinates of the target object.
  • the latitude LAT T0 and longitude LONG TO coordinates of the target object are calculated by:
  • ⁇ ⁇ is the difference between the latitudes (in decimal) of two linear horizontal reference corners
  • S y is the difference between the longitudes (in decimal) of two linear vertical reference corners
  • MapPixelx is the x location (horizontal) of the target object on the map image
  • MapPixely is the y location (vertical) of the target object on the map image
  • MapOriginx is the horizontal position of the origin in the map image
  • MapOriginy is the vertical position of the origin in the map image.
  • the geographic coordinates can be latitude and longitude coordinates. Coordinates can be calculated in a Universal Transverse Mercator coordinate system.
  • Coordinates can be calculated in a Universal Polar Stereographic coordinate system.
  • a video data viewing window may not be shown on a screen.
  • a target object viewing window may not be shown on a screen.
  • a map view window may not be shown on a screen. Any combination of a target object viewing window, a video data viewing window and a map view window can be shown on one or more screens.
  • a target object can be assigned an index number or other marker. This index number or marker can then be stored and used to identify the same target object in later video frames or in separate video data streams. In this way, once a target object has been identified in a first series of video frames, it is possible to identify the same target object in a second series of video frames.
  • An assigned index number or marker for a target object can be displayed on a screen, for example, in the vicinity of the target object to which that index number or marker is assigned in video frames shown in a video data viewing window.
  • a visual identifier for a target object can take any suitable form for identifying the target object in a video data viewing window. For example, a visual identifier can comprise a box overlaying the image of the target object in the frame shown in the video data viewing window.
  • a visual identifier can comprise an arrow pointing to the target object in the video data viewing window. Any suitable colours or shapes can be used for a visual identifier. Alternatively, in some embodiments, no visual identifiers are shown in the video data viewing window.
  • the latitude and longitude coordinates of a target object can be displayed on the screen.
  • the latitude and longitude coordinates of the target object can be displayed in the vicinity of the target object shown in a video frame displayed in a video data viewing window.
  • a distance from the target object to the fixed camera can be calculated and shown on the screen.
  • a distance from the target object to the PTZ camera can be calculated and displayed on the screen.
  • a distance from the target object to a reference point or landmark can be determined and shown on the screen.
  • An estimated velocity of a target object can be shown on the screen.
  • the velocity of the target object can be calculated by comparing the coordinates of the target object across a series of video frames.
  • the video data shown in a video data viewing window may not originate from a fixed camera. Multiple cameras may be used to track a target object and to better triangulate a position of the target object. Video data from one or more of the multiple cameras can be displayed in a video data viewing window.
  • the location of a target object can be represented in any way in a map view window.
  • the location of a camera can be represented in the map view window.
  • the locations of known landmarks and/or reference points can be represented in the map view window.
  • a target object may be selected by a user.
  • a target object can be detected by motion.
  • One or more target objects may be tracked simultaneously.
  • a distance between two target objects can be calculated from the determined positions of the two target objects.
  • the ground distance between two target objects can be shown on the screen.
  • the ground distance between a first target object and a second target object may be calculated by comparing the latitude and longitude coordinates of the first target object with the latitude and longitude coordinates of the second target object.
  • the ground distance GD between a first target object and a second target object can be calculated as: where R « 6371 km is the approximate radius of the Earth. Parameter s is the central angle between the position of the first target object and the position of the second target object in latitude and longitude coordinates, and is given by:
  • Atan2 (... , ... ) is the arctangent function with two arguments, capable of interpreting the signs of the two arguments to return the appropriate quadrant of the angle, and a is given by:
  • Equation (22) ⁇ long and ⁇ lat are given by:
  • (Lat ⁇ Long ⁇ are the calculated geographic coordinates of the first target object and (Lat 2 , Long 2 ) are the geographic coordinates of the second target object.
  • the ground distance between a target object and the fixed camera may also be calculated in this way.
  • the ground distance between a first target object and a second target object may be calculated from the determined position of the first target object and the determined position of the second target object in the video frame coordinate system.
  • Identification of a target object can be performed by any suitable means.
  • a target object can be identified by a user.
  • a target object can be identified by motion sensors.
  • object segmentation was used to identify a target object.
  • any suitable method for identifying a target object can be used.
  • a received video frame can be converted into any suitable image representation.
  • any suitable colour space can be chosen, e.g. RGB, normalised RGB, HSV.YIQ, YCbCr, YPbPr, CI ELAB and RGB colour ratio.
  • normalised RGB and HSV are the most common colour spaces used for object segmentation as these colour spaces have been shown to be more tolerant of minor variations in the illuminant.
  • a background model can be created using any suitable method.
  • a background model is created by analysing a series of received images which represent relatively static background frames. These frames can be represented by, for example, spectral reflectivity weights as explained above. The weights of each of the background frames can then be averaged to obtain a mean set of background frame weights that are then used to form the background model.
  • the mean background frame can be called BG mean .
  • any suitable method can be used. Processes are envisaged that do not rely on calculating the spectral reflectivity of objects in a scene.
  • Parkinnen spectral basis functions were used.
  • the skilled person would appreciate that other representations can be used.
  • embodiments can use other spectral basis functions, an appropriate set of eigenvectors of the statistical distribution of the spectral reflectances of Munsell colour chips, or of natural surfaces, or an appropriately selected set of Fourier basis functions.
  • the autocorrelation of the background frame is calculated, without normalisation of its magnitude, as
  • the ratio C BG between the autocorrelation of the reflectivity of each frame and the autocorrelation of the mean background reflectivity of background frames can be calculated:
  • the threshold values C min and C max can then be calculated by
  • VF Corr The autocorrelation of the video frame, VF Corr can then be calculated as
  • a pixel belongs to the foreground object or to the background.
  • a position of the target object in the video frame coordinate system can be determined by any suitable method, and any suitable coordinate system can be used.
  • the position of the target object in the video frame is determined by calculating a horizontal distance and a vertical distance from a predetermined point, such as the bottom left corner of the video frame.
  • Reference points used to map a position of a target object in a video frame coordinate system to latitude and longitude may be chosen by a user.
  • the latitude and longitude coordinates of reference points can be established by manual readings. These manual readings can then be calibrated with a
  • a calculated ground distance from a target object to a PTZ camera can be used to adjust a zoom of the PTZ camera.
  • the zoom of the PTZ camera can alternatively or additionally be adjusted based on a calculated velocity of the target object.
  • the panning and tilting of the PTZ camera can be adjusted according to any suitable criteria.
  • the latitude and longitude coordinates of the target object may be sent to another device capable of tracking or following the movement of the target object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

A method comprising receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera; tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system.

Description

Object Location Technique
Field
The present disclosure relates to methods, systems and computer readable media for processing video data captured by a camera, in particular to automatically locate the position of a target object. A particular application of the object location techniques of the present disclosure is in video surveillance systems.
Background
Video surveillance systems can use one or more cameras to capture video footage of a surveillance site. Object location techniques can be used to automatically determine the location or position of a target object detected in the captured video data. The position of the object can be displayed as an indicator or icon on a dedicated map of the surveillance site.
The present disclosure seeks to extend the functionality of known object location techniques.
Summary
According to one aspect of the present disclosure, there is provided a method comprising: receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera; tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system. The method can convert the position of a target object in captured video data into geographic coordinates in a geographic coordinate system, for example a world geographic coordinate system, to determine the position or location of the target object in the geographic coordinate system.
The method may be an automated, computer-implemented method. The method therefore can automatically convert the position of a target object in video data to geographic coordinates in a geographic coordinate system.
The method may further comprise using the position of the geographic coordinates to display on a map image an indicator to show the position of the target object on the map image. The map image may be displayed in a map view window. The map image may be a satellite image. The position of the target object in a satellite image can therefore be easily identified by a viewer.
Converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system may comprise converting the position of the target object in the video frame coordinate system to a position of the target object in a map coordinate system; and converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.
Converting the position of the target object in the video frame coordinate system to a position of the target object in the map coordinate system may comprise applying a homographic transformation to the position of the target object in the video frame coordinate system.
The homographic transformation may be determined by a mapping of each of the positions of at least four non-collinear points in the video frame coordinate system to a corresponding position of each of the points in the map coordinate system.
Converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system may comprise: selecting at least three reference points in the map coordinate system; identifying, for each of the at least three reference points, corresponding geographic coordinates in the geographic coordinate system; determining a mapping for converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system using the reference points and identified geographic coordinates of the reference points; and using the mapping to convert the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.
The method may comprise displaying the video data in a video data viewing window. The method may comprise displaying a visual identifier for the target object in the video data viewing window. The visual identifier is used to clearly distinguish the target object from the background in the video frame and to allow a viewer of the video data viewing window to quickly and clearly locate the target object within the video frame.
The method may comprise displaying the geographic coordinates for the target object in a video data viewing window.
The method may comprise calculating the distance between the target object and a fixed point in the video frame. The method may comprise displaying the distance in a video data viewing window.
The method may comprise comprising tracking a plurality of target objects. The method may comprise calculating the distance between two target objects. The method may comprise displaying the distance in a video data viewing window.
The method may comprise calculating the velocity of a target object. To calculate the velocity, the distance and speed may be calculated from the geographic coordinates. The method may comprise displaying the velocity in a video data viewing window.
The method may comprise converting the position of the moving object from the video frame coordinate system to tracking coordinates in a tracking device coordinate system for a tracking device; adjusting the tracking device to direct the tracking device towards the moving object.
The tracking device may be a light or a loudspeaker or a camera. The tracking device may be a tracking camera and adjusting the tracking device to direct the tracking device towards the moving object comprising adjusting the camera to monitor the moving object. The method may further comprise displaying the view captured by the tracking camera.
The view captured by the tracking camera may be displayed in a target object viewing window. The tracking coordinates can be calculated from the geographic coordinates.
According to another aspect of the present disclosure, there is provided a system comprising a memory, a processor and at least one camera, the system being configured to perform a method according to an aspect of the present disclosure. According to another aspect of the present disclosure, there is provided a computer readable medium having stored therein computer readable instructions which, when executed by a processor, cause the processor to perform a method according to an aspect of the present disclosure.
The features, functions, and advantages can be achieved independently in various embodiments of the present disclosure or may be combined in yet other
embodiments in which further details can be seen with reference to the following description and drawings.
Brief Description of the Drawings
Illustrative embodiments of the present disclosure will now be described, by way of example only, with reference to the drawings. In the drawings:
Figure 1 is an illustration of a screen shot;
Figure 2 is an illustration of a later screen shot;
Figure 3 is an illustration of an even later screen shot;
Figure 4 is a flowchart of a method for determining a geolocation of a target object from video data;
Figure 5 is an illustration of another screen shot;
Figure 6 is a schematic block diagram of a computer system;
Figure 7 is a flowchart of a method for object segmentation;
Figure 8 is a diagram showing how illumination and the image picked up by a camera are related; and
Figure 9 is a flowchart of a method for determining geographic coordinates of a target object from a video frame.
Throughout the description and the drawings, like reference numerals refer to like parts.
Detailed Description
FIGs 1-3 depict an embodiment of the present disclosure in which a target object is tracked by a surveillance system. In particular, the coordinates of the target object are determined in the coordinate system of a video frame received from a fixed camera and are used to calculate the geolocation (latitude and longitude) for the target object. The geolocation of the target object is used to focus a pan-tilt- zoom (PTZ) camera on the target object, the PTZ camera being used to follow the object as it moves. As such, a motion of the target object is tracked across a sequence of video frames.
FIG. 1 depicts a screen shot of a screen 10 which is displayed to a user. In particular, FIG. 1 depicts the screen 10 having a video data viewing window 20, a target object viewing window 30, and a map view window 40.
Video data is received from a fixed camera and displayed in the video data viewing window 20. The fixed camera captures video data of a site, which in this example is a car park. A target object 22, which in this example is a car, is identified in a video frame of the video frame data. Identification of the target in this embodiment is performed by object segmentation, in which the motion of the car identifies the car as a target object 22. A visual identifier 24, which in this example is a box around the target object 22, is displayed for the target object 22 in the video data viewing window 20.
The position of the target object 22 in a coordinate system of the video frame is determined. The coordinate system corresponds to a horizontal distance and a vertical distance from a corner of the frame displayed in the video data viewing window 20.
Using a number of known reference points within the frame, the position of the target object 22 in the video frame coordinate system is mapped to latitude and longitude coordinates for the target object 22. For example, one reference point may be the corner of the building shown in FIG. 1 for which the latitude and longitude
coordinates are known. Another possible reference point may be the tip of a lamppost such as that shown in the image for which the latitude and longitude coordinates are known. By using reference points within the frame shown in the video data viewing window 20 and knowledge of the latitude and longitude coordinates of those reference points, the latitude and longitude coordinates for the target object 22 are calculated.
The calculated latitude and longitude coordinates of the target object 22 are used to adjust a pan-tilt-zoom (PTZ) camera so that the PTZ camera focuses on the target object 22. In the example shown in FIG. 1 , the PTZ camera has centred the target object 22 in the target object viewing window 30 and has zoomed to an appropriate level. Within the map view window 40, a map of the car park is depicted, on which the position 42 of the target object 22 is shown. The position 46 of the fixed camera from which the video data is received is also depicted. The position 44 of the PTZ camera from which the images of the target object viewing window 30 are received is also depicted.
The video data comprises one or more sequences of frames. In the sequence being described the car moves in the car park as the sequence of frames progresses, and the PTZ camera tracks the car as it moves. FIG. 2 depicts a screen shot when a frame later in the sequence of frames than that shown in FIG. 1 is displayed in the video data viewing window 20. That is, the frames in the sequence of frames between the frame depicted in FIG. 1 and the frame depicted in FIG. 2 are not illustrated in the figures for reasons of brevity, but on the actual screen a continuous sequence of frames is shown and the PTZ camera tracks the car as it moves. In the depicted frame of the video data the target object 22 and associated visual identifier 24 has now moved to a different location in the car park. The new position of the target object 22 in the video frame coordinate system is determined and used to calculate the latitude and longitude coordinates for the target object 22. The latitude and longitude coordinates of the target object 22 are used to direct the PTZ camera to focus on the new location of the target object 22 as shown in the target object viewing window 30. The new location 42 of the target object 22 is also illustrated in the map view window 40. A similar process is followed for all or some of the frames in the sequence between the one shown in FIG. 1 and the one shown in FIG. 2.
FIG. 3 depicts a screen shot when a frame later in the sequence of video frames than that shown in FIG. 2 is shown in the video data viewing window 20. A new position of the target object 22 is displayed in the video data viewing window 20 along with a visual identifier for the target object 22. Once again the position of the target object 22 in the video frame coordinate system is determined and the latitude and longitude coordinates for the target object 22 are calculated. The latitude and longitude coordinates for the target object 22 are used to direct the PTZ camera to focus on the newly calculated location of the target object 22 as shown in the target object viewing window 30. The new location 42 of the target object 22 is illustrated in the map view window 40. Again, as with the transition between the frames shown in FIGs. 1 and 2, on the actual screen a continuous sequence of frames is shown between the frames shown in FIGs. 2 and 3 and a similar process is followed for all or some of the frames in the sequence between the one shown in FIG. 2 and the one shown in FIG. 3.
FIG. 4 is a flowchart depicting a method for determining a geolocation of a target object from video data. The method is an automated, computer-implemented method.
Referring to FIG. 4, at step 405 the process starts as video data is received from a fixed camera. The video data is displayed in a video data viewing window at step 410. At step 415 a target object is identified in a video frame of the received video data. A target object is typically a moving object within the field of view of the fixed camera.
At step 420, a position of the target object in the video frame coordinate system is determined.
At step 425, a visual identifier for the target object is displayed in the video data viewing window. The displayed visual identifier is used to clearly distinguish the target object from the background in the video frame and to allow a viewer of the video data viewing window to quickly and clearly locate the target object within the video frame. In the depicted embodiments the visual identifier is a rectangle around the target object.
At step 430 the latitude and longitude coordinates for the target object are calculated. In order to do this, stationary reference points within the video frame are used.
These reference points may, for example, include a location of a lamppost or the corner of a building etc. For each of these reference points, coordinates in the video frame coordinate system are known. Additionally for each these reference points, latitude and longitude coordinates are known. In this way a mapping exists between the coordinates of each reference point in the video frame coordinate system and their respective latitude and longitude coordinates. Accordingly, by comparing the position of the target object in the video frame coordinate system with the position of each reference point in the video frame coordinate system, it is possible to calculate the latitude and longitude coordinates for the target object. Further details on how this can be done are explained below. At step 435 the calculated latitude and longitude coordinates for the target object are displayed in the video data viewing window. The latitude and longitude coordinates can be displayed in the vicinity of the target object or anywhere within the video data viewing frame.
At step 440 a distance is calculated between the target object and the fixed camera from which the video data is received. Latitude and longitude coordinates of the fixed video camera are known and so the ground distance between the target object and the fixed camera can be calculated by comparing the coordinates of the target object with the coordinates of the fixed camera.
At step 445, the distance data is displayed in the video data viewing window.
At step 450, the pan-tilt-zoom (PTZ) camera is adjusted to monitor the target object. Information concerning the calculated latitude and longitude coordinates for the target object is used to direct the PTZ camera to the target object. The PTZ camera is configured to zoom to an appropriate level for viewing the target object based on the calculated distance data. For example if the ground distance between the target object and the fixed camera is known and the geolocation of the PTZ camera is known then a distance from the PTZ camera to the target object can be calculated and used to adjust the zoom of the PTZ camera. Frames captured by the PTZ camera are displayed in a target object viewing window.
At step 455, an indicator is used to indicate the position of the target object on a map image in a map view window.
At step 460, a decision is made as to whether or not a next video frame should be processed.
If subsequent frames are to be processed, the method loops back to step 420 for analysis of the next frame. This can occur if, for example, the target object moves and so has a different location in subsequent frames.
Alternatively, if a next video frame is not to be processed then the process ends at step 465. This can occur if, for example, the target object moves out of the field of view of the fixed camera or if no video frames remain to be processed. In this way a target object is monitored as it moves within the field of view of the fixed camera.
FIG. 5 depicts a screen shot 10 according to another embodiment. In this
embodiment a target object viewing window is not present. Video data is received and shown in the video data viewing window 20 and a moving object 22 is identified within a received video frame of the video data. A visual identifier 24 for the target object 22 is displayed in the video data viewing window 20. Latitude and longitude coordinates are calculated for the target object 22.
In a map view window 40 the location 42 of the moving object 22 is shown.
The methods described herein may be implemented on a computer apparatus such as that illustrated in FIG. 6. Referring to FIG. 6, the computer apparatus comprises a communications adaptor 605, a processor 610 and a memory 615. The computer apparatus also comprises an input device adaptor 620 for communicating with an input device 625. The computer further comprises a display adaptor 630 for operation with a display 635. The processor 610 is configured to receive data including video data, access for memory 615, and to act upon instructions received either from said memory 615 or said communications adaptor 605. The
communications adaptor 605 is configured to receive data and to send out data.
Data received by the processor 610 includes video data captured by a fixed camera 640. Processor 610 is configured to process the video data from the fixed camera 640. Processor 610 is further configured to identify a target object in a video frame of the video data, calculate the latitude and longitude coordinates of the target object and cause this information to be displayed on the display 635. The processor is further configured to adjust a PTZ camera 650 to track said target object based on the calculated latitude and longitude coordinates for the target object. Video data from the PTZ camera can then be displayed on display 635.
Other architectures to that shown in FIG. 6 may be used as will be appreciated by the skilled person. For example, the computer apparatus may be a distributed system which is distributed across a network or through dedicated local connections.
The methods described herein may be implemented by a computer program. The computer program may include computer executable code or instructions arranged to instruct a computer to perform the functions of one or more of the methods described above. The computer program and/or the code or instructions for performing such methods may be provided to an apparatus, such as a computer, on a computer readable medium or computer program product. The computer readable medium could be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, for example for downloading the code over the Internet. Alternatively, the computer readable medium could take the form of a physical computer readable medium such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-R/W or DVD.
Detection
Detection of a moving target object can be performed by any suitable method such as optical flow; temporal differencing; and background modelling, commonly known as background subtraction or a hybrid approach which combines a number of approaches. In embodiments, a target object is detected by object segmentation methods, such as that set out below.
Referring to FIG. 7, the first step of a typical object segmentation algorithm is to receive a video frame at step 710. The video frame is converted to normalised RGB data at step 720. At step 730, a pixel of the video frame is compared with a corresponding pixel of a background model. At step 740, a determination is made as to whether the pixel of the video frame differs from the corresponding pixel of the background model by more than a threshold value. If a determination is made that the difference is greater than a threshold value then the pixel of the video frame is categorised as belonging to a target object at step 750. Alternatively, if a
determination is made that the difference is less than a threshold value then the pixel of the video frame is categorised as belonging to the background at step 760, i.e. not belonging to a potential target object. If further pixels remain to be processed (step 770) then the process loops to step 730. If no further pixels remain to be processed, then post-processing of the pixels categorised as belonging to the target object occurs at step 780. The process terminates at step 790. In order to compare a pixel of a received video frame with a corresponding pixel of a background model at step 740, in some embodiments an assessment of the reflectivity properties of an object covered by the pixel is used as set out below.
The reflectivity of an object is a measure of the amount of light reflected by the object, or radiance, relative to the amount of incident light shone on the object, or irradiance, and is indicative of the reflectance or intrinsic brightness of the object. The reflectivity of an object can be used as a signature of the object. Hence it can be used to segment the object with respect to the remainder of an image. The reflectivity of an object is composed of diffuse (Lambertian, body) and specular (surface) components.
Referring to FIG. 8, the output of a camera depends on three factors:
1 . Illuminant (light source) (E(λ)) 800
2. Spectral response characteristics (Q) of camera (sensors) 840
3. Reflectivity (5(2)) of the object 830
For an RGB camera, the response to light at a given pixel is defined by the triplet of responses given by R, G and B outputs. The R G and B outputs are related to the illuminant, the camera response and the spectral reflectivity by Equation (1):
Figure imgf000013_0001
where λ is the wavelength (the visible range is approximately from 400nm to 700nm), E(X) is the spectral power distribution of the illuminant which can be estimated by the available knowledge of the environment or by reference to a background model, S(X) is a spectral reflectivity function characterising the proportion of light on an object that the object reflects.
Function QR (λ) characterises the red camera sensor spectral response
characteristics, which determines the proportion of the red colour signal the sensor absorbs on a per-wavelength basis and can, for example, be given from a manufacturer's data sheet. QG(λ) characterises the green camera sensor spectral response characteristics; QB(λ) characterises the blue camera sensor spectral response characteristics; wd is a parameter for the diffuse reflection 820 component; and the geometrical parameter for the specular reflection 810 component is given by ws.
In order to build a computational physical model, the spectral reflectivity function S(λ) is represented as a weighted sum of spectral basis functions by:
Figure imgf000014_0001
where is a basis function and wt is a corresponding weight. In the example that
Figure imgf000014_0003
follows, n = 3 and the spectral basis functions are Parkkinen spectral basis functions (J. Parkkinen et al. Characteristic spectra of munsell colors. J. Opt. Soc. Am. A, 6(2):318-322, 1989. 3).
The aim now is to calculate the weights of the spectral basis functions, to obtain the spectral reflectivity of the object represented by the pixels of the current image. The calculated weights can then be used in the comparison of the video frame with the background model.
The model of Equation (1) is rewritten as:
Figure imgf000014_0002
where
Figure imgf000015_0006
The first basis function of Parkkinen is constant, and so
Figure imgf000015_0007
Figure imgf000015_0005
The first term can be merged with the specular component term to give:
Figure imgf000015_0001
Next Xi Yi and Zi are calculated by:
Figure imgf000015_0002
These integrations are calculated to obtain the transformation matrix
Figure imgf000015_0003
Now the weights of the basis functions can be obtained from RGB values by:
o
Figure imgf000015_0004
Figure imgf000016_0001
As a special case, for diffuse-only reflection ws = 0:
Figure imgf000016_0002
where
Figure imgf000016_0003
By using the transformation of Equation (12), the RGB image of the video frame is represented by basis function weights as an expression of the reflectivity of the surfaces represented in the current image. That is, the R, G and B values of each pixel in the image of the video frame are used to calculate the basis function weights characterising the spectral reflectivity of any surface that pixel covers.
In order to compare two objects (whether they have the same surface material and colour or not), the spectral reflectivity of the surfaces of both objects can be compared by cross-correlation.
This can be achieved by finding the correlation between the Parkkinen basis functions:
Figure imgf000016_0004
If the first surface has weights given by wlt w2 and w3 l and the second surface has weights given by w\, w'2 and w'3 , then the correlation, indicating the degree of similarity between two different surfaces becomes:
Figure imgf000017_0001
where lx and I2 represent the first and second surfaces.
One way to compare the video frame with the background model is to use Equation (12). The background model is formed by averaging over weights (i.e. taking the mean value of the weights) calculated for a number of input images which represent relatively static background frames. The average weights, along with the weights calculated for the video frame, can be substituted into Equation (12) to define
Figure imgf000017_0002
where BG represents a pixel of the background model and VF represents a pixel of the received video frame.
Each pixel, by comparing the calculated value of Equation (15) with threshold values Cmax and Cmin , can then be categorised as part of the background or as part of the foreground (steps 750 and 760). In this way, a foreground mask, FGMask, is defined.
Figure imgf000017_0003
Once all pixels have been processed, a post-processing of the foreground mask can begin (step 780). After the initial foreground mask has been created from the video frame, a dilation step is applied, which performs a dilation operation to expand or thicken the initial foreground mask. Following that, an erosion operation is applied to the resultant mask, which reduces the size of an object by eliminating area around the object edges. The erosion operation removes noise and eliminates foreground mask details smaller than a predetermined structure element size (e.g. 2 pixels).
In the way described above, a target object is identified.
Other methods of identifying a target object in a video frame are possible, as would be appreciated by the person skilled in the art. Derivation of geolocation information
A geographic coordinate system is a coordinate system that uses geographic coordinates which enable, in principle, every location on the Earth to be specified by a set of coordinates. A geographic coordinate system can be considered as a universal geolocation coordinate system. Embodiments using latitude and longitude coordinates will be described, although any suitable geographic coordinate system can be used.
A location of a target object can be derived from video data by any suitable method. FIG. 9 shows a flowchart of a method for deriving geographic coordinates for a target object identified in video data from a fixed camera according to an embodiment. In this embodiment, the geographic coordinates of the target object are calculated by first calculating the coordinates of the target object in a site map, such as may be viewed in a map view window, that covers the field of view of the fixed camera.
At step 910, four points (pixels) are selected within a video frame of the received video data and their respective positions in a video frame coordinate system are identified.
At step 920, the site map is oriented such that north is directed to the positive y-axis (ordinate) of the site map and east is directed towards the positive x-axis (abscissa) of the site map.
At step 930, four points are selected within the site map, the four points
corresponding to the respective positions of each of the objects covered by the four points selected within the video frame.
At step 940, a perspective transformation matrix is calculated. At step 950, a homography transformation matrix is calculated using the perspective transformation matrix. The homography transformation matrix maps the positions of the selected four points within the video frame to the corresponding four points within the site map.
The position of an object within a map coordinate system of the site map is therefore related to the position of the object within the video frame coordinate system by the transformation:
Figure imgf000019_0001
Where CamPixelx is the position of the pixel covering the object in the x (horizontal) direction in the video frame, CamPixely is the position of the pixel covering the object in the y (vertical) direction, MapPixelx is the position of the object on the site map in the east-west (horizontal) direction, MapPixely is the position of the object on the site map in the north-south (vertical) direction, and H represents the homography transformation matrix.
Steps 910, 920, 930, 940 and 950 lead to the calculation of a homography transformation matrix. Steps 910, 920, 930, 940 and 950 need only be performed once to provide the homography transformation matrix for a given viewpoint of the fixed camera.
At step 960, the homography transformation matrix, H , is applied to a position of a target object identified in a video frame, in the video frame coordinate system. The position of the target object can be determined by the position of a pixel at the centre- of-mass of the target object in the video frame. By applying the homography transformation matrix to the position of the target object within the video frame, a position of the target object within a map coordinate system of the site map is identified.
At step 970, three corners of the site map are selected and their latitude and longitude coordinates identified. At step 980, the height of the map is calculated. That is, the distance, MapImageH eight, between two vertically aligned corners of the site map is calculated. Additionally, at step 980 the width of the map is calculated. That is, the distance, MapImageWidth, between two horizontally aligned corners of the site map is calculated.
Steps 970 and 980 are used to calculate appropriate parameters for converting a position of an object in the map coordinate system to a position of the object in latitiude and longitude coordinates. Steps 970 and 980 need only be performed once to provide the appropriate parameters. At step 990, the known latitude and longitude coordinates of the four corners of the site map and the height of the map and the width of the map are used to calculate the latitude and longitude coordinates of the target object. The latitude LATT0 and longitude LONGTO coordinates of the target object are calculated by:
Figure imgf000020_0001
where δχ is the difference between the latitudes (in decimal) of two linear horizontal reference corners, Sy is the difference between the longitudes (in decimal) of two linear vertical reference corners, MapPixelx is the x location (horizontal) of the target object on the map image, MapPixely is the y location (vertical) of the target object on the map image, MapOriginx is the horizontal position of the origin in the map image, and MapOriginy is the vertical position of the origin in the map image.
Variations of the described embodiments are envisaged, and the features of the disclosed embodiments can be combined in any way.
The geographic coordinates can be latitude and longitude coordinates. Coordinates can be calculated in a Universal Transverse Mercator coordinate system.
Coordinates can be calculated in a Universal Polar Stereographic coordinate system.
A video data viewing window may not be shown on a screen. A target object viewing window may not be shown on a screen. A map view window may not be shown on a screen. Any combination of a target object viewing window, a video data viewing window and a map view window can be shown on one or more screens.
A target object can be assigned an index number or other marker. This index number or marker can then be stored and used to identify the same target object in later video frames or in separate video data streams. In this way, once a target object has been identified in a first series of video frames, it is possible to identify the same target object in a second series of video frames. An assigned index number or marker for a target object can be displayed on a screen, for example, in the vicinity of the target object to which that index number or marker is assigned in video frames shown in a video data viewing window. A visual identifier for a target object can take any suitable form for identifying the target object in a video data viewing window. For example, a visual identifier can comprise a box overlaying the image of the target object in the frame shown in the video data viewing window. For example, a visual identifier can comprise an arrow pointing to the target object in the video data viewing window. Any suitable colours or shapes can be used for a visual identifier. Alternatively, in some embodiments, no visual identifiers are shown in the video data viewing window.
Other information can be displayed on the screen. The latitude and longitude coordinates of a target object can be displayed on the screen. In particular, the latitude and longitude coordinates of the target object can be displayed in the vicinity of the target object shown in a video frame displayed in a video data viewing window. A distance from the target object to the fixed camera can be calculated and shown on the screen. A distance from the target object to the PTZ camera can be calculated and displayed on the screen. A distance from the target object to a reference point or landmark can be determined and shown on the screen.
An estimated velocity of a target object can be shown on the screen. The velocity of the target object can be calculated by comparing the coordinates of the target object across a series of video frames.
Other variations of the described embodiments are also contemplated. For example, the video data shown in a video data viewing window may not originate from a fixed camera. Multiple cameras may be used to track a target object and to better triangulate a position of the target object. Video data from one or more of the multiple cameras can be displayed in a video data viewing window.
The location of a target object can be represented in any way in a map view window. For example, the location of a camera can be represented in the map view window. The locations of known landmarks and/or reference points can be represented in the map view window.
A target object may be selected by a user. A target object can be detected by motion.
One or more target objects may be tracked simultaneously. A distance between two target objects can be calculated from the determined positions of the two target objects. The ground distance between two target objects can be shown on the screen. The ground distance between a first target object and a second target object may be calculated by comparing the latitude and longitude coordinates of the first target object with the latitude and longitude coordinates of the second target object.
The ground distance GD between a first target object and a second target object can be calculated as:
Figure imgf000022_0001
where R « 6371 km is the approximate radius of the Earth. Parameter s is the central angle between the position of the first target object and the position of the second target object in latitude and longitude coordinates, and is given by:
Figure imgf000022_0002
Where atan2 (... , ... ) is the arctangent function with two arguments, capable of interpreting the signs of the two arguments to return the appropriate quadrant of the angle, and a is given by:
Figure imgf000022_0003
In Equation (22), Δlong and Δlat are given by:
Figure imgf000022_0004
where (Lat^ Long^ are the calculated geographic coordinates of the first target object and (Lat2, Long2) are the geographic coordinates of the second target object. The ground distance between a target object and the fixed camera may also be calculated in this way. The ground distance between a first target object and a second target object may be calculated from the determined position of the first target object and the determined position of the second target object in the video frame coordinate system.
Identification of a target object can be performed by any suitable means. A target object can be identified by a user. A target object can be identified by motion sensors.
In the embodiment described in relation to FIGs 7 and 8, object segmentation was used to identify a target object. However, any suitable method for identifying a target object can be used.
Variations on the described method of object segmentation are envisaged such as those set out below.
A received video frame can be converted into any suitable image representation. In this respect, any suitable colour space can be chosen, e.g. RGB, normalised RGB, HSV.YIQ, YCbCr, YPbPr, CI ELAB and RGB colour ratio. However, normalised RGB and HSV are the most common colour spaces used for object segmentation as these colour spaces have been shown to be more tolerant of minor variations in the illuminant.
A background model can be created using any suitable method. In certain embodiments, a background model is created by analysing a series of received images which represent relatively static background frames. These frames can be represented by, for example, spectral reflectivity weights as explained above. The weights of each of the background frames can then be averaged to obtain a mean set of background frame weights that are then used to form the background model. The mean background frame can be called BGmean.
When comparing the video frame to the background model, any suitable method can be used. Processes are envisaged that do not rely on calculating the spectral reflectivity of objects in a scene.
In the embodiment described in relation to FIGs 7 and 8, Parkinnen spectral basis functions were used. The skilled person would appreciate that other representations can be used. For example, instead of Parkinnen basis functions, embodiments can use other spectral basis functions, an appropriate set of eigenvectors of the statistical distribution of the spectral reflectances of Munsell colour chips, or of natural surfaces, or an appropriately selected set of Fourier basis functions.
In certain embodiments, the autocorrelation of the background frame is calculated, without normalisation of its magnitude, as
Figure imgf000024_0001
For each of the background training frames BGi where i represents the background frame number, the ratio CBG between the autocorrelation of the reflectivity of each frame and the autocorrelation of the mean background reflectivity of background frames, can be calculated:
Figure imgf000024_0002
The threshold values Cmin and Cmax can then be calculated by
Figure imgf000024_0003
The autocorrelation of the video frame, VFCorr can then be calculated as
and the ratio
Figure imgf000024_0004
can then be compared with the threshold values to determine whether or not a pixel belongs to the foreground object or to the background.
Once a target object has been identified, a position of the target object in the video frame coordinate system can be determined by any suitable method, and any suitable coordinate system can be used. In certain embodiments, the position of the target object in the video frame is determined by calculating a horizontal distance and a vertical distance from a predetermined point, such as the bottom left corner of the video frame.
Variations on the described method of determining a geolocation of a target object are envisaged such as those set out below.
Reference points used to map a position of a target object in a video frame coordinate system to latitude and longitude may be chosen by a user.
The latitude and longitude coordinates of reference points can be established by manual readings. These manual readings can then be calibrated with a
measurement apparatus' position in a video frame to establish a mapping between a position of a target object in a video frame coordinate system and latitude and longitude coordinates of the target object.
A calculated ground distance from a target object to a PTZ camera can be used to adjust a zoom of the PTZ camera. The zoom of the PTZ camera can alternatively or additionally be adjusted based on a calculated velocity of the target object. Similarly, the panning and tilting of the PTZ camera can be adjusted according to any suitable criteria.
In certain embodiments, there is no PTZ camera to track a movement of a target object. The latitude and longitude coordinates of the target object may be sent to another device capable of tracking or following the movement of the target object.
The above embodiments have been described by way of example only, and the described embodiments are to be considered in all respects only as illustrative and not restrictive. It will be appreciated that variations of the described embodiments may be made without departing from the scope of the invention.

Claims

Claims:
1. A method comprising:
receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera;
tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and
converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system.
2. A method according to claim 1 , further comprising:
using the position of the geographic coordinates to display on a map image an indicator to show the position of the target object on the map image.
3. A method according to claim 2, wherein the map image is a satellite image.
4. A method according to any preceding claim, wherein the converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system comprises:
converting the position of the target object in the video frame coordinate system to a position of the target object in a map coordinate system; and
converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.
5. A method according to claim 4, wherein the converting the position of the target object in the video frame coordinate system to a position of the target object in the map coordinate system comprises applying a homographic transformation to the position of the target object in the video frame coordinate system.
6. A method according to claim 5, wherein the homographic transformation is determined by a mapping of each of the positions of at least four non-collinear points in the video frame coordinate system to a corresponding position of each of the points in the map coordinate system.
7. A method according to any of claims 4 to 6, wherein converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system comprises:
selecting at least three reference points in the map coordinate system;
identifying, for each of the at least three reference points, corresponding geographic coordinates in the geographic coordinate system;
determining a mapping for converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system using the reference points and identified geographic coordinates of the reference points; and
using the mapping to convert the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.
8. A method according to any preceding claim, further comprising:
calculating the distance between the target object and a fixed point in the video frame.
9. A method according to any preceding claim, comprising tracking a plurality of target objects.
10. A method according to claim 9, further comprising calculating the distance between two target objects.
1 1. A method according to any preceding claim, comprising calculating the velocity of a target object.
12. A method according to any of claims 8 to 1 1 , wherein the distance and speed are calculated from the geographic coordinates.
13. A method according to any preceding claim, further comprising:
converting the position of the moving object from the video frame coordinate system to tracking coordinates in a tracking device coordinate system for a tracking device;
adjusting the tracking device to direct the tracking device towards the moving object.
14. A method according to claim 13, wherein the tracking device is a tracking camera and adjusting the tracking device to direct the tracking device towards the moving object comprising adjusting the camera to monitor the moving object, the method further comprising:
displaying the view captured by the tracking camera.
15. A method according to claim 14, wherein the tracking coordinates are calculated from the geographic coordinates.
16. A system comprising a memory, a processor and at least one camera, the system being configured to perform the method of any preceding claim.
17. A computer readable medium having stored therein computer readable instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 15.
PCT/GB2017/053366 2016-11-08 2017-11-08 Object location technique WO2018087545A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB201618837 2016-11-08
GB1618837.7 2016-11-08

Publications (1)

Publication Number Publication Date
WO2018087545A1 true WO2018087545A1 (en) 2018-05-17

Family

ID=60388089

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2017/053366 WO2018087545A1 (en) 2016-11-08 2017-11-08 Object location technique

Country Status (1)

Country Link
WO (1) WO2018087545A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021196294A1 (en) * 2020-04-03 2021-10-07 中国科学院深圳先进技术研究院 Cross-video person location tracking method and system, and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009096893A1 (en) * 2008-02-01 2009-08-06 Imint Image Intelligence Ab Generation of aerial images
US20110128388A1 (en) * 2009-12-01 2011-06-02 Industrial Technology Research Institute Camera calibration system and coordinate data generation system and method thereof
US20130128050A1 (en) * 2011-11-22 2013-05-23 Farzin Aghdasi Geographic map based control
US20150154745A1 (en) * 2011-03-07 2015-06-04 Stéphane Lafon 3D Object Positioning in Street View

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009096893A1 (en) * 2008-02-01 2009-08-06 Imint Image Intelligence Ab Generation of aerial images
US20110128388A1 (en) * 2009-12-01 2011-06-02 Industrial Technology Research Institute Camera calibration system and coordinate data generation system and method thereof
US20150154745A1 (en) * 2011-03-07 2015-06-04 Stéphane Lafon 3D Object Positioning in Street View
US20130128050A1 (en) * 2011-11-22 2013-05-23 Farzin Aghdasi Geographic map based control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
J. PARKKINEN ET AL.: "Characteristic spectra of munsell colors", J. OPT. SOC. AM. A, vol. 6, no. 2, 1989, pages 318 - 322, XP002574122, DOI: doi:10.1364/JOSAA.6.000318

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021196294A1 (en) * 2020-04-03 2021-10-07 中国科学院深圳先进技术研究院 Cross-video person location tracking method and system, and device

Similar Documents

Publication Publication Date Title
Bouman et al. Turning corners into cameras: Principles and methods
US9646212B2 (en) Methods, devices and systems for detecting objects in a video
JP6144656B2 (en) System and method for warning a driver that visual recognition of a pedestrian may be difficult
CN112102409B (en) Target detection method, device, equipment and storage medium
EP3593322B1 (en) Method of detecting moving objects from a temporal sequence of images
KR101548639B1 (en) Apparatus for tracking the objects in surveillance camera system and method thereof
CN109035307B (en) Set area target tracking method and system based on natural light binocular vision
CN110197185B (en) Method and system for monitoring space under bridge based on scale invariant feature transform algorithm
Gómez et al. Intelligent surveillance of indoor environments based on computer vision and 3D point cloud fusion
CN105913464A (en) Multi-body target online measurement method based on videos
EP3761629B1 (en) Information processing device, autonomous mobile body, information processing method, and program
Krinidis et al. A robust and real-time multi-space occupancy extraction system exploiting privacy-preserving sensors
GB2509783A (en) System and method for foot tracking
CA2793180A1 (en) Method and device for the detection of moving objects in a video image sequence
Kwon et al. Automatic sphere detection for extrinsic calibration of multiple RGBD cameras
WO2018087545A1 (en) Object location technique
Shi et al. A method for detecting pedestrian height and distance based on monocular vision technology
Hadi et al. Fusion of thermal and depth images for occlusion handling for human detection from mobile robot
CN113723432B (en) Intelligent identification and positioning tracking method and system based on deep learning
CN117115434A (en) Data dividing apparatus and method
KR101463906B1 (en) Location Correction Method Using Additional Image Information
CN111833384A (en) Method and device for quickly registering visible light and infrared images
Jędrasiak et al. The comparison of capabilities of low light camera, thermal imaging camera and depth map camera for night time surveillance applications
CN110910379A (en) Incomplete detection method and device
CN111489384A (en) Occlusion assessment method, device, equipment, system and medium based on mutual view

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17800925

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17800925

Country of ref document: EP

Kind code of ref document: A1