GB2528246A

GB2528246A - Region based image compression on a moving platform

Info

Publication number: GB2528246A
Application number: GB1412030.7A
Authority: GB
Inventors: Christopher Robert Spence; Edmund Sparks; Estelle Tidey
Original assignee: Roke Manor Research Ltd
Current assignee: Roke Manor Research Ltd
Priority date: 2014-07-07
Filing date: 2014-07-07
Publication date: 2016-01-20
Also published as: GB201412030D0

Abstract

A method for processing image data captured from a moving platform (e.g. unmanned aerial vehicle, UAV) or surveillance vehicle comprises: defining and segmenting regions of interest (ROIs) from a background image data (scene), and compressing the ROIs separately from the background image data prior to transmission of the data stream. ROIs or objects of interest may comprise moving objects; image scene motion can be calculated by frame homography. Compressed ROI and background data may be multiplexed. Transmission of data over a band-limited data link is also described, with segmentation of ROI and background data, and encoding of ROI data at a first (higher, low compression ratio) quality and encoding background data at a second, lower quality. Wide Area Motion Imagery (WAMI) or Full Motion Video (FMV) capture may be used, with geolocation and image stabilisation.

Description

Region Based Image Compression on a Moving Platform

Technical Field

Embodiments of the invention relate to a method of compressing image data obtained from a moving platform, and to a system configured to perform the method.

Background to the Invention and Prior Art

Distribution of video broadcast or capturcd from flying platforms such as un-manned aerial vehicles (UAVs), lighter-than-air craft, tethered balloons, helicopters and fixed wing aircraft.

presents particular problems. and in particular if the field of view of the sensor mounted on the vehicle is very wide and the sensor is very high resolution. This is because a large amount of information is captured by the sensor, much of which is of little interest to the viewer, but which still needs to be encoded and transmitted.

The bandwidth required to transmit high resolution video data is often greater than that available to UAVs or similar, particularly those operating in remote environments, and so the video must be compressed, while maintaining the fidelity and saliency of the information contained therein, for transmission over the hand-Umited connection.

In many situations a user is only interested in objects that are active, e.g. moving objects, as the background scene is non-pertinent. Compression of the video data while maintaining high resolution for regions of interest is required.

Segmenting moving objects from images captured by a camera is non-trivial when the camera or sensor is stationary, and the task becomes much more difficull when the platform on which the camera is mounted is moving relative to the ground.

The main approach adopted to segment moving objects is to register pairs of image frames which have been captured a short time apart and difference the registered images, pixel by pixel. In an idcaised case there will only he differences between images where there are moving targets. Moving targets can then he detected by processes including thresholding or blob-detection applied to the difference image.

However, image registration is particularly dilTicult where there is significant 3D structure in the scene. This applies particularly in extreme cases, such as in urban situations with tall buildings, or in particularly mountainous areas, where occlusion and dis-occlusion may be mistaken for target activity.

In addition, in compression according to the known MPEG standards, the moving parts of the image are frequently compressed at lower quality to save bandwidth and because the lower quality is not as noticeable when watching video in near-real-time. However, this can be counter to the requirements of surveillance from moving platforms, and particularly of moving targets, where higher res&ution and quality images of the moving target are preferable.

There are thus significant problems in providing high resolution, good quality imagery over band limited connections from remote moving surveillance platforms.

Summary of the Invention

Embodiments of the present disclosure aim to improve upon methods and systems for captunng. compressing and transmitting video data from a moving platform. In particular.

embodiments of the present invention allow for the capture of video content from a moving platform, segmenting regions of interest (ROIs) from a background scene which is moving relative to the camera, and compressing the background scene separately from the ROIs prior to transmission of the data stream in order to maintain high quality, and preferably the highest quality available, for the regions of interest.

Embodiments of the invention therefore attempt to automatically identify areas of an image that are moving with respect to a fixed background scene and aim to transmit these regions in higher quality than the background scene. This is non-trivial as although the background is fixed with respect to the earth, it is moving with respect to the sensor or camera.

In view of the above, according to one embodiment of the invention there is provided a method comprising defining one or more regions of interest within image data captured from a moving platform, segmenting the regions of interest from die background image data; compressing (he regions of interest data separately from the background image data, and transnntting the compressed regions of interest over a data connection.

With the above a region of interest segmentation can he performed. which segmented regions are then encoded at a higher qua'ity than the non-segmented, background, regions. With such a segmentation and encoding policy the limited bandwidth that may be available to a remote surveillance vehicle can be used in transmitting the determined regions of interest at higher quality and/or resolution than the background regions. In this way the limited bandwidth available is more efficiently and effectively empthyed.

Further features and advantages of embodiments of the invention will be apparent from the appended claims.

Brief Description of the Drawings

Further features and advantages of the present invention will become apparent from the following description of an embodiment thereof, presented by way of example only, and by reference to the drawings, wherein like reference numerals refer to like parts. and wherein: Figure I is a diagram illustrating an operating environment of an embodiment of the invention; Figure 2 is a block diagram of the component parts of an UAV in an embodiment of the invention: Figure 3 is a block diagram of an example controller of the UAV; Figure 4 is a flow diagram of an embodiment of the invention; and Figure 5 is a diagram illustrating an overview of the operation of embodiments of the invention.

Description of (he Embodiments

Figure 1 shows an environment of a particular embodiment of the invention. An un-manned aerial vehicle (UAV) (10) is provided with one or more sensor units 12 which obtain image data from the terrain hdow. Sensor unit 12 may he a camera or other image capture type sensor, such as a Full Motion Video (FMV) sensor or a wide area motion imagery WAMI) sensor. Often UAV operators require detailed image data relating to a moving target 14.

Moving target 14 may he moving through a three dimensional environment, such as a huilt up area as shown, or a mountainous region. The images captured by sensor unit(s) 12 will include the moving target 14, but the moving target will be in the context of the background environment which is also moving with respect to the UAV. The moving background environment makes detection of (he moving target more complex than if the sensors were capturing image data from a non-moving platform. as it is not simply possible to use difference operations or simple motion vector information to identify the moving target. as the whole image is effectively moving from frame to frame. The process by which the detection of moving objects within image data captured from a moving p'atform is therefore explained in detail below. It will he understood that the invention is not limited to the use of UAVs, and can equally be implemented in other moving platforms. and in particular flying platforms such as lighter-than-air craft, helicopters, fixed wing aircraft, or tethered balloons.

However, embodiments of the invention may also he employed in other moving platforms, such as ground based vehicles, or water based vehicles. That is, embodiments of the invention may find application from any type of moving surveillance platform.

Fig. 2 is a block diagram showing an example high level configuration of the UAV 10. The UAV 10 may be provided with one or more external actuators 102, to allow the UAV to carry out the tasks for which it is designed. UAV 10 is also provided with one or more sensors 12, particularly camera or other image sensors (which may operate in any relevant part of the spectrum, such as visible, and/or infra-red) which form an input to the system. A controller 106 which controls the motion and other actions of the UAV is also provided, athng with an optional Inertial Measurement Unit (IMU) 108. Data collected from llvIU 108 allows controller 106 to track the position of the UAV supplementary to, or in place of a UPS unit.

IMU 108 data or UPS data may also form an input to the image processing program. shown in Fig. 3, and explained below.

Propulsion and Steering unit 1010 propels and steers the UAV under command from the controller 106. In the context of the UAV 10, the propulsion and steering unit 1010 would typically comprise an engine driving a propeller, or a small jet engine, as well as aerofoil and aerodynamic control surfaces to allow controlled flight. Many suitable UAVs which can form the aerial platform for the present embodiment are already known in the art.

Controller 106 is shown in more detail in Fig. 3. It can be seen that controller 106 comprises a central processing unit 32, provided with a memory 54 and an input/output interface 56 to allow the controller to interface with and control the other components of the UAV. A computer readable storage medium 38 such as a hard drive, flash memory drive or similar is also provided, on which run programs which allow the CPU 32 to control the UAV in its assigned tasks, including an autopilot to control the UAV, as well as to process the incoming image data in accordance with the embodiments of the invention.

Provided on the storage medium 38 are image processing program 382. location determining program 384, and control program 386. Control program 386 is provided which contains instructions to allow the CPU to control the UAV in its actions, as well as the basic functions of the UAV such as propulsion and steering. Control program 386 may in fact be several different programs. the further operation of which is beyond the scope of this description, suffice to say that they operate to a'low the UAV to function, and in particular to undertake aerial surveillance operations, as known in the art. In addition to the control program 386, embodiments of the invention provide image processing program 382 which receives the image data being captured by sensor 12 and processes it in a manner to be described prior to sending to the output for transmission over a data connection.

Location determination program 384 is also provided in storage medium 38 of controfler 106.

The location deternilnation program 384 takes data from the IMU and/or optional GPS unit to calculate the location of the UAV 10. The geolocation data which is generated by location determination program 384 can he included with the image data to he transmitted over the limited bandwidth data connection, for use in subsequent image data processing, described further below.

Figure 5 presents an overview of the image processing performed by embodiments of the invention. As explained further below, image data captured from the sensors 12 is first rectified and stabilised. prior to bcing input to a video analytics module that identifies regions of interest in the image data, typically being moving objects within the image. As explained, this is non-trivial, as the whole image will typically he moving globally to some extent reflecting the fact that it has been captured from a moving platform. The video analytics is able to compensate for the global movement of the platform to identify the regions of interest that typically indicate moving targets in the field of view, and to segment the regions of interest from the background. The identified regions of interest are then compressed/encoded at a different, higher, quality than the background, and then at least the compressed/encoded regions of interest are then fed to a multiplexer which controls transmission of data from the platform over a limited bandwidth data connection. In some embodiments, the region of interest data only might he transmitted over the connection, particularly where high quality image data of the background has already been captured a priori. In other embodiments, the segmented background data may also be transmitted, after being compressed/encoded at a lower quality than the regions of interest. The intention of the differential compression/encoding between the identified regions of interest and the segmented background is to allow a majority of the limited bandwidth data connection to be used in transnntting the image data from the regions of interest at as high a quality and/or resolution as possible. given the bandwidth constraints. At the very least, in some embodiments a higher quality and/or resolution for the regions of interest is used and obtained than would otherwise be the case than if no region of interest segmentation was performed.

Further details of the operation of the image processing program operating in accordance with embodiments of the present invention will now he described with reference to Fig. 4.

Fig. 4 shows a flow diagram which has an image collected by sensor 12 as its input. A further optional input to the image processing program is data collected from UPS unit or IMU 108. The input to the system at step 4.010 is prelerally FMV or WAMI imagery, as defined above, from one or more sensors. Where the geolocation and pointing direction of (he sensor is known (e.g. by means of Global Positioning System (GPS) and/or and Inertial Measurement Unit (IMU) or similar) this can also form an input to the system, shown at step 4.011. For wide angle sensors andlor where cameras are used such that the images from rnultipe sensors arc to he stitched together, the images require rectification. Rectification, in step 4.02 makes use of a camera calibration, 4.021 and an optional Digital Terrain Elevation model (DTEM), 4.020 and the geolocation information obtained if the UAV is provided with optional IMU 108 or UPS unit.

The camera calibration at step 4.021 may be in one of the formats defined in!S(VLS 19130:2019 "Geographic information -Imagery sensor models for geopositioning", or a proprietary format specific to the particular sensor. The resulting rectified image 4.03 1 may he fully rectified down to a geospatial coordinate system. e.g. Universal Transverse Mercator (UTM). or it may retain an oblique viewpoint, i.e. it will be a perspective projection. If geolocation information was provided to the system, then the rectified imagery will have corresponding georeferencing information, shown at step 4.030. This may take the form of the latitude and longitude and height of the 4 corners of the image or one of the other formats described in standard iSO/TS j91 Th20P In all cases the next processing step in the system is image stahihsation, at step 4.04. The image stabilisation step may use processing operations known in the prior art. e.g. the Harris algorithm ( SIFT

G SURF

( optical flow (Lucas Kanade, or similar. Because these image processing operations are per se known already in the art, no further explanation thereof will be undertaken.

The image stahilisation step resulls in a number of outputs. The first is a modd of how the coordinate system in one frame maps to that in another frame, e.g. a frame-to-frame homography and/or fundamental matrix, at step 4.05. The method of deriving the homography or fundamental matrix mapping frame to frame coordinate systems is known, and consists of the following elements: * Detection of point features using the Harris corner operator (more info in C Harris, M Stephens "A Combined Corner and Edge Detector", Proc. 4th Alvey Vision Conference); * Feature tracking using a 2D algorithm (R J Evans. E L Brassington "Video Motion Processing for Event Detection and Other Applications" lEE Annual Conference on Visual hnage Engineering, V1E2003, University of Surry) * Analysis of the tracked feature positions, primarily based on use of the fundamental matrix, F. In short the fundamental matrix encapsulates information about the apparent motion of features between a pair of frames. Let xi and X2 be the position of a stationary feature in images 1 & 2 in pixel coordinates, and F be the fundamental matrix. Then xi'Fx2 = 0 (where indicates transpose). For ease of mathematical expression the 2-dimensonal position. x, is actually a 3-vector (u v j)T, where (u v) is the position in 2-dimensional image pixel coordinates and F is a 3 by 3 matrix.

This is one example of a technique for deriving a fundamental matrix from moving image frames, and other techniques may he employed without departing from the scope of the invention.

The second output of the image stabilisation step is an output video (at step 4.06) that is stabilised by transforming the video according to a sealed version of the transformation that aims to reduce and/or smooth the apparent motion of the camera through the scene. The frame relative transformations may be the raw transformation or the smoothed transformation, depending on whether the subsequent analytics step operates on the stahilised or raw imagery. Where geolocation information is avaflable, the geolocation information also needs to be transformed, at step 4.07, so that it reflects the output imagery.

S

The video analytics steps aim to automatically find the salient aspects of the imagery for its intended purpose. The technique described here is to use the frame relative transformation to find objects dial are moving over the lixed background using the three step process outlined above, on the assumption that objects that are moving are more interesting to a user than those that are stationary. in most cases video analytics algorithms operate best on the raw video from the sensor, however if the raw data is especially distorted or unstable, the video analytics may he better performed on the rectified or stahilised imagery.

In a further embodiment of the invcntion, the video analytics step 4.08 allows for a user to identify one or more objects of interest and then a tracking algorithm such as TLD (developed by Kalal, University of Surrey -tq iet ia w r tr-1 / a 41d o) is used to provide an ROl I or that object for each frame. The ROIs, at step 4.09, once identified, can be treated separately from the rest of the image, which relates to the background.

Yet another option of the video analytics step is for the analytics to automatically find objects based on a model that it has already learned. One such example is a person detector based on a Histogram of Oriented Gradients technique To reduce the amount of processing power needed, these techniques may be combined. i.e. one technique used to automatically flag objects of interest performed for example every 10 frames, and another to track those objects in the intervening frames.

The stabilised imagery, at step 4.06, is then compressed using a scalable compression technique, at step 4.10. One such technique is that described in US2012/170659 to ST MICROELECTRONICS. The aim is that the resulting video at step 4.11 is compressed, but the ROIs identified by the video analytics step are excluded. Since a model of the transformation from one frame to another has already been determined by the image analytics step. this can be used to help the compression. lii other words, the compression of the background images is improved by having calculated the homography from frame to frame.

The final step is the Multiplexer (MUX), at step 4.12. The MUX combines the outputs from ROTs segmented from the rest of the image, the geo-referencing metadata, if available, and the compressed background imagery. This can then he transmitted over a communications network at step 4.13 allowing for the regions of interest to occupy the largest amount of the available limited bandwidth possible, so as to retain the accuracy and saliency of the data within for further subsequent analysis.

At a receiver (not shown in Fig. 4) the resulting FMV or WAMI imagery can he dispaycd, which can thcn optionally be overlaid with thc automatically detected regions of interest and geolocation information.

In some embodiments it is not required that the MUX combine the Regions ol Interest with the compressed imagery. In the case that a UAV has made multiple passes over the same area, high-quality, high-resolution images of the landscape and background imagery may already have been collected. Using the geolocation data combined with the regions of interest identified in the image analysis steps, the regions of interest can he overlaid onto the high resolution background imagery. With such an arrangement, only the regions of interest image date need be transmitted from the UAV. which can then make the full use of the available bandwidth so as to be sent at the highest available resolution and/or quality possible over the available bandwidth.

In some embodiments high resolution still image data may be captured at the same time as the video. The multiplexer may multiplex the still image data onto the data connection with the region of interest data.

The overall approach described above is vahd for both full motion video (FMV) sensors and wide area motion imagery (WAMI) sensors. The difference between the two is that FMV sensors operate at high frame rates (>4 fps) and low resolution (< 12 MPix), whereas WAMI sensors operate on lower frame rates (<4 fps) and high resolution (> 12 MPix and potentially a few OPix). The solution works with any waveband including but not limited to Electro-Optic, IR, II, false colour etc. The main benefit of embodiments of the invention is that the bandwidth needed to transnilt or onwardly distribute the video is smaller however the accuracy and saliency of the information is retained.

in addition, since the technique for segmenting the background scene from foreground objects is performed prior to compression, a model of the global motion of the background scene is already avaflable to the compression algorithm. Informing the compression step of the algorithm with the calculated scene motion (expressed as a homography) may help in cithcr or both of: reducing thc proccssing timc subscqucntly nccdcd to compress the video; and/or improving the quality of the resulting background image as motion vectors are calculated globally rather than per macro hthck.

In one embodiment the automatically detected moving targets can be embedded with the video (ideally as a metadata stream rather than "burnt" onto the video) and can be used to provide the viewer with improved situational awareness. That is, metadata indicating the regions of interest in which moving targets have been detected can, in some embodiments, he transmitted as side data stream with the video stream, and which is then used at the receiver to augment. the recreated image data to highlight the regions of interest. In this way. the original video stream is also available to the remote viewer, for example for further image processing operations to he applied to.

Resulting image chips can be displayed on top of either the compressed background forming part of the video or overlaid on new or existing high resolution photography.

In one embodiment, particularly where WAMI data is being obtained, WAMI data obtained from satellite imagery may he used as the background at the receiving end, with regions of interest then being transmitted over the data link and overlaid on top of the satellite WAMT data. With such an arrangement it should not be necessary to transmit the background data from the remote vehicle, and the limited bandwidth data link can then be dedicated to sending as high a quality and/or resolution as possible imagery relating to the identfied regions of interest.

Further modifications, whether by way of addition, deletion, or substitution will be apparent to the intended reader, being a person skilled in the art, to provide further embodiments, any and all of which are intended to he encompassed hy the appended claims.

Claims

Claims 1. A method comprising; defining one or more regions of interest within image data captured from a moving platform, segmenting the regions of interest from the background image data; compressing the regions of interest data separately from the background image data, and transmitting the compressed regions of interest over a data connection.
2. The method of claim 1, wherein the regions of interest relate to objects moving relative to their environment.
3. The method of any preceding claim, wherein the defining of regions of interest comprises calculating the image scene motion due to the relative motion of the moving platform and the environment.
4. The method of claim 3, wherein calculating the image scene motion includes calculating a frame to frame homography.
5. The method of claim 3 or claim 4, further comprising compressing the background image data.
6. The method of any of claim 5. further comprising multiplexing the compressed regions of interest data and background image data, prior to transmission.
7. The method of claim 5 or claim 6, wherein the calculated scene motion is used to inform the step of compressing the background image data.
8. The method of any of claims 5 to claim 7, wherein the compression ratio of the regions of interest is smaller than the compression ratio of the background image data.
9. The method according to any preceding claim, wherein the data connection is a limited bandwidth data connection, and the quality of the regions of interest data is maximised according to the limits of the limited bandwidth data connection.
10. The method of any preceding claim, wherein the image data captured is Wide Area Motion imagery.
11. The method of any of daims I to 9, wherein the image data captured is Full Motion Video.
12. The method of any preceding claim, wherein the image data is captured from an unmanned aerial vehicle.
13. A method for transmitting image data over a band-limited data link, comprising: determining regions of interest within an image, the remaining areas of the imagebeing image background data;segmenting the regions of interest from the image background data; encoding the regions of interest with a first encoding quality, encoding the background image data at a second encoding quality, the second encoding quality being lower than the first encoding quality, and transmitting at kast the regions of interest over the hand-limited data link.
14. A surveillance vehicle, comprising one or more sensors arranged to capture image data, a control device, comprising a processor, wherein the processor is arranged to: define one or more regions of interest within the image data segment the regions ol interest from (he background image data; compress the regions of interest data separately from the background image data, and transmit the compressed regions of interest over a data link.
15. The surveillance vehicle of claim 14, wherein the regions of interest relate to objects moving relative to their environment.
16. The surveillance vehicle of claim 14 or claim 15. wherein the defining of regions of interest comprises calculating the image scene motion due to the relative motion of the moving platform and the environment.
17. The surveillance vehicle of daim 16. wherein calculating the image scene motion includes calculating a frame to frame homography.
18. The surveillance vehicle of claim 16 or claim 17, wherein the processor is furtherarranged to compress the background image data.
19. The surveillance vehicle of any of claim 18, wherein the processor further comprises a multiplexer arranged to multiplex the compressed regions of interest data and background image data, prior to transmission.
20. The surveillance vehicle of claim 18 or claim 19, wherein the calculated scene motion is used to inform the step of compressing the background image data.
21. The surveillance vehicle of any of claims 18 to claim 20, wherein the compression ratio of the rcgions of intercst is smaller than the comprcssion ratio of thc background image data.
22. The surveillance vehicle according to any of claims 14 to 21, wherein data link is of limited bandwidth, and the quality of the regions of interest data is niaximised according to the limits of the limited bandwidth data link.
23 The surveillance vehicle of any of claims 14 to 22, wherein the image data captured is Wide Area Motion Imagery.
24. The surveillance vehicle ci any of claims 14 to 22, wherein the image data captured is Full Motion Video.
25. The surveillance vehicle of any oiclaims 14(0 24, wherein the surveillance vehicle is an aircraft. awl more preferably an unmanned aerial vehicle.
26. A method according to claim 13. wherein the image is captured using sensors on a moving platform, and the determining step indudes determining as the regions of interest regions of the image that contain a moving ohjcct.
27. A method according to claim 26, wherein the determining step further comprises discriminating within the image hetween moving objects that are themselves moving and gthhal movement of objects within the image due to movement ci the moving patiorm on which the image sensors are based, the regions of interest being determined so as to contain the moving objects that are themselves moving.
28. A method according to claims 26 or 27, wherein the moving platform is a moving vehicle, preferably an unmanned aerial vehicle UAV).
29. An apparatus, comprising: a processor: and at least one computer readable storage medium storing a computer program arranged such that when it is executed by the processor it causes the processor to: i) determine regions of interest within a surveillance image, the remaining areas of the image being image background data; ii) segment the regions of interest from the image background data; iii) encode the regions of interest with a iirst encoding quality, iv) encode the background image data at a second encoding quality, the second encoding quality being lower than the first encoding quality. andv) transmit at least the regions of interest over a band-limited data link.
30. An apparatus according to claim 29. wherein die image is captured using sensors on a moving platform, and the processor is further arranged to determine as the regions of interest regions of the image that contain a moving object.
31. An apparatus according to claim 30, wherein the processor is further arranged to, as part of the deternuning, discriminate within the image between moving objects that are themselves moving and global movement of objects within the image due to movement of the moving platform on which the image sensors are based, the regions of interest being determined so as to contain the moving objects that are themselves moving.
32. An apparatus according to claims 30 or 31, wherein the moving platform is a moving vehicle, preferably an aircraft, and more preferably an unmanned aerial vehicle (UAV).
33. A vehicle, preferably an aircraft. and more preferably an unmanned aerial vehicle (UAV). further comprising an apparatus according to any of claims 29 to 32.