EP2415013A1

EP2415013A1 - Method for virtually extending and enhancing the field of view of a scene

Info

Publication number: EP2415013A1
Application number: EP10721356A
Authority: EP
Inventors: Denis Marraud
Original assignee: European Aeronautic Defence and Space Company EADS France
Current assignee: Airbus SAS
Priority date: 2009-04-03
Filing date: 2010-04-02
Publication date: 2012-02-08
Also published as: FR2944174B1; US8995751B2; US20120051627A1; FR2944174A1; WO2010112604A1

Abstract

The invention relates to a method for virtually extending and enhancing the field of view of the common image of a scene created by a video comprising a plurality of images. Said method comprises the following steps: (a-) extending said field of view by at least one mosaic produced from said images and by inserting at least one contextual datum in the repository of the common image of said scene; and (b-) enhancing said field of view by at least one piece of information referenced in said contextual datum.

Description

METHOD FOR VIRTUAL EXTENSION AND ENRICHMENT OF THE SCENE OBSERVATION FIELD

DESCRIPTION

TECHNICAL AREA

The invention is in the field of image processing and relates to the observation of a scene from a still image or a video.

More specifically, the invention relates to a method of virtual extension and enrichment of the field of view of the current image of a scene described by a video comprising several images in order to extract relevant information.

STATE OF THE PRIOR ART The solutions used in the prior art for extending and enriching the field of view of a scene generally consist in exploiting image source location meta-data to project the right-of-way of the field of view. view of said source or the image itself on a geographical reference (map, ortho-image, 3D model, ...). This makes it possible to locate the current image provided by the image source on a particular area to be observed.

Thus, in the case of CCTV or UAV ground station applications, the known solutions consist in exploiting the location metadata of the on-board cameras.

A disadvantage of this solution stems from the fact that the operator must look at two sources of information (the image and the geographical reference) which, under stress conditions (military operations, terrorist attacks ...) undermines efficiency and responsiveness.

Furthermore, the auxiliary data are generally not precise enough to allow a precise location of the image on the geographical reference. Finally, depending on the conditions of acquisition of the image, and despite the visualization of the footprint, it can be tedious to designate points of correspondence (buildings, streets, ...) between the image and the reference.

To overcome this disadvantage, some systems, including video surveillance, propose to project the fields of view of each camera in a 3D view of the scene. The operator then has access to camera views by navigating the real-time 3D model. This solution solves the problem of the perception of the location of each camera but is subject to potential problems of deformation of 3D objects after reprojection and also has the disadvantage of displaying to the operator a transformed image (depending on the point from an operator point of view is generally very poorly perceived

(need to have access at any moment to the raw information coming from the least transformed sensor possible).

The issue of extending the field of view of a scene has already been proposed by Honda et al.

(Nickname Expanding Field of View for Immersive

Projection Displays - K. Honda, N. Hashimoto, M. Sato - SIGGRAPH '07) and Magjarevic et al (Non-Optical Expansion of the Field of View of the Rigid Endoscope - R. Magjarevic et al, World Congress on Medical Physics and Biomedical Engineering 2006). However, the techniques described in these documents use only the recent images of the video (N last images) in a real-time mosaicization approach and therefore only work in close-up movements for which the current image is always entirely included in the previous image.

The object of the invention is to overcome the disadvantages of the prior art described above.

STATEMENT OF THE INVENTION

This goal is achieved by means of a method in which the mosaic potentially uses all of the available video and the missing information can be updated from contextual information such as a larger field image available on the area, an aerial image, a 3D model recaled according to the good point of view, a map, ....

This is achieved by a method of virtual extension and enrichment of the field of view of the current image of a scene described by a video comprising several images comprising the following steps: a- extending said field of view by at least one mosaic obtained from said images and by inserting at least one contextual datum into the geographic reference frame of the current image of said scene, b- enriching said field of view with at least one information referenced in said datum contextualized method, characterized in that it further comprises the steps of: c- insert auxiliary location data in the current image, d- complete the field of view of the current image of said scene by portions mosaic stored in the field of observation extended by step a).

In a first variant embodiment, step a) is obtained by real-time multi-resolution mosaicization of said video in a fixed frame of reference.

Said mosaic previously stored includes a number of predefined fixed resolution levels. In another variant embodiment, step a) is obtained by multi-resolution time-delayed mosaicing performed on all the images of said video.

In this case, the number of resolution levels of the mosaic is estimated from the characteristics of said video.

The method according to the invention further comprises a contextual referencing of the current image of said scene either by direct registration between said current image and said contextual data, or by matching the mosaic available with said contextual data.

The auxiliary location data inserted in the current image is for example an aerial image of the unwinding zone of said scene, a 3D model texture or not of the unwinding zone of said scene, images of said scene taken according to different points of view, or contextual data of geographical map type. Thanks to the method according to the invention, the interpretation of the current image is then facilitated by its inclusion in its more global context and its semantic enrichment.

BRIEF DESCRIPTION OF THE DRAWINGS Other characteristics and advantages will emerge from the description which follows, taken by way of non-limiting example, with reference to the appended figures in which:

FIG. 1 schematically illustrates the extension of the field of view of a video scene by combined use of a multi-resolution mosaic and contextual data according to the invention,

FIG. 2 schematically illustrates the enrichment of the field of view of the scene of FIG. 1 by transmission of semantic information referenced in the contextual data.

DETAILED PRESENTATION OF PARTICULAR EMBODIMENTS

The method according to the invention occurs in the context of the use of images or videos for the observation or monitoring of an extended scene. The applications are multiple: zone surveillance from a fixed or mobile camera, installed on the ground or on board any flying machine (drone, dirigible, plane, helicopter, ...), navigation in a network of CCTV cameras , sports broadcast (Tour de France, Formula 1, horse races, ...) ....

These applications have in common the observation at a given moment of a small part of the scene to be observed. Indeed, a compromise is necessary between the dimensions of the objects that can be observed (resolution) and the coverage of the observation (field of view). This compromise is reflected in the focal length used: a high focal length ("zoom") corresponds to a fine resolution of the observed scene but offers a reduced field of view which is detrimental to the overall perception of the scene and in particular to the location of the scene observed within said zone. Conversely, a weak focal length ("wide angle") allows a good perception of the whole scene but does not allow a fine observation of objects, vehicles, people, present on the ground.

The proposed method makes it possible to observe a scene with a maximum of resolution (high focal length) while maintaining a good perception of the location of the scene observed within the whole area of interest. For example, the global area of interest may be, depending on the application a combat zone observed by a drone, a city observed by a network of CCTV cameras, the route of a stage of the Tour de France. The area observed at all times by the video is then, respectively, centered on a convoy moving in the combat zone, one of the views of CCTV cameras, a view centered on the peloton (cycling race). In all three cases, you can view the close-up views in their global context (global area of interest) would provide particularly relevant information for the interpretation of the observed scene (what is the convoy approaching in which direction is the car observed by the camera X, where is the peloton compared to the arrival...)

The objective of the virtual extension mechanism of the field of view is to give the possibility to the operator to reduce virtually the focal point of his objective ("virtual dezoom") so as to better locate the current observed area relative to the area of global interest. In practice, the dezoom results in the embedding of the current field of view in a larger field image whose out-of-field current view pixels are derived from the available contextual information.

FIG. 1 illustrates the virtual extension mechanism of the field of view of an area 2 in a video scene composed of several images 4. Step 6 consists in generating, from said video, a multi-resolution mosaic 10.

Mosaic is multi-resolution in the sense that it is constructed as a pyramid of tiles corresponding to different resolution levels and allowing to take into account large variations of the ground resolution of the video.

Thus, a video presenting a continuous zoom on an area (passage from a wide plane to a tight plane) leads to a mosaic made of tiles more and more resolved and more and more localized within the initial low resolution tile. The real field of view of zone 2 is then completed, in part by reprojection (step 14) of the mosaic formed in step 6, partly by resetting (step 14) and reprojection of one or more contextual data 16 in the repository of the current image.

In the case of Figure 1, the contextual data is a map of the area to be observed.

However, it can be a 3D model texture or not, other images taken from different points of view or aerial image of said area.

After reprojection, an extended field of view 20 is obtained comprising a current image of the zone 2 replaced in the reconstituted context from the mosaic 6 and including the contextual data 16.

In a first embodiment, the multi-resolution mosaic is made in real time, in a fixed repository (typically geographical reference, reference of the first image, etc.), then the mosaic portions included in the extended field of view. are used to complete the current field of view.

Using a fixed repository for calculating the mosaic dataset avoids recalculating the complete mosaic dataset for each image, which would lead to unnecessary computational workload and rapid degradation of image quality.

Moreover, the use of a stored mosaic makes it possible to extend the possibilities of expansion of the field of view with respect to a mosaicization limited to the last images of the video. In a second embodiment, the multi-resolution mosaic is operated in deferred mode over the entire video, and then the multi-resolution mosaic is used every time the video is delayed.

Compared to real-time, the field of view extension capabilities from the video data are therefore potentially increased since the method then relies not only on past but also future images.

In real-time mode, the number of resolution levels of the mosaic is fixed and defined at the input of the algorithm.

In deferred mode, on the other hand, the number of resolution levels can be estimated from the characteristics of the video such as, for example, the variation of the ground resolution over the entire video to be mosaicized.

In the case where a single level of resolution is imposed, the mosaicing is done in the resolution of the selected reference frame (geographical reference, first image, ...)

The multi-resolution mosaic is completed in both real-time and deferred modes by contextual referencing of the current image. This referencing uses either a direct registration between the current image and contextual data (aerial image, image according to another point of view, map, 3D model, ...), or is done via the registration of the mosaic available (real time mosaic in real-time mode or complete mosaic mode in delayed mode) with the contextual data.

In both cases, the referencing can help any optional auxiliary location data (case of an observation drone for example).

The matching between the current data and the contextual data is used: a) To reproject the contextual data in the extended field of view, thus allowing a better interpretability of the image, b) According to the type of reference data allowing to estimate the shooting conditions of the current data (typically position and 3D orientation). In the case of using 3D reference data, the estimated shooting conditions are used to reproject the 3D model optimally.

The interest of a "virtual de-zoom" compared to a "real de-zoom" is of several kinds: on the one hand the "de-zoom" is potentially unlimited: the limit is given only by the scope of available contextual data. Furthermore, in a real-time context, the real de-zooming has the risk of losing sight of the object of interest, especially if it is tracked by an automatic tracking algorithm. Finally, in deferred mode (investigation, editing, ...), the operator no longer has the ability to influence the shooting parameters and thus to perform a real de-zoom. Figure 2 illustrates the enhancement mechanism of the field of view of zone 2 to increase the intelligibility of the scene.

In addition to the steps described above with reference to FIG. 1, this mechanism includes an additional step of enriching the contextual data 16 with semantic or asemantic information such as the name of the street 24 or the tracking pads 26. This mechanism allows thus, thanks to the permanent registration 14 of the current image with the contextual data, to transmit, at the request of the operator, said semantic or asemantic information on the contents of the scene directly in the observed image (and if appropriate in the extended field) in an "augmented reality" approach.

Such an approach contributes to synthesizing the information available on a given scene within a single visualization, considerably increases the intelligibility of the observed scene and in the end improves the efficiency of the operator.

In the case of a drone aerial surveillance urban area for example, the semantic information displayable directly in the image are for example:

• a building of interest,

• street names,

• a moving target tracked by other means, • Asemantic information can typically result from the processing of the current image for the detection of changes with respect to the real time mosaic and / or with respect to the reference data. Such a change detection makes it possible to draw the attention of the operator to an object present in the current image and absent from the previous observation, for example.

The method according to the invention applies in the case where an extended scene is observed by one or more sensors. Examples include: • Reconnaissance UAV ground station: Virtual extension and field of view enhancement are particularly relevant in an urban surveillance environment where ground-based resolution requirements impose a reduced field of view. and therefore a difficulty on the part of the operator to precisely locate the current image relative to its surveillance zone (neighborhood, city, ...),

• Situation keeping from a video surveillance network: the features provided by the method according to the invention make it possible to facilitate the multi-camera tracking of an object of interest, the global location of an object observed.

Claims

1. A method of virtual extension and enrichment of the field of view of the current image of a scene described by a video comprising several images: a- extending said field of observation by at least one mosaic obtained from said images and insertion of at least one contextual data in the repository of the current image of said scene, b- enrich said field of view by at least one information referenced in said contextual data, characterized in that it comprises in in addition to the steps of: c- inserting auxiliary location data in the current image, completing the field of view of the current image of said scene by previously stored mosaic portions included in the field of view extended by step a).

The method of claim 1, wherein step a) is obtained by real-time multi-resolution mosaicization of said video in a fixed repository.

3. The method of claim 1, wherein step a) is obtained by multi-resolution mosaicization delayed time operated on all the images of said video.

4. The method according to claim 3, wherein said mosaic previously stored comprises a number of predefined fixed resolution levels.

The method of claim 4, wherein the number of resolution levels of the mosaic is estimated from the characteristics of said video.

6. Method according to claim 2 or claim 3, further comprising a contextual referencing of the current image of said scene either by direct registration between said current image and said contextual data, or by matching the mosaic available with said contextual data. .

The method of claim 1, wherein said contextual data is an aerial image of the unwinding area of said scene.

8. The method of claim 1, wherein said contextual data is a 3D model texture or not the unwinding area of said scene.

The method of claim 1, wherein said contextual data is images of said scene taken from different viewpoints.

The method of claim 1, wherein said contextual data is a geographical map.