WO2015079470A2

WO2015079470A2 - Video coding system for images and video from air or satellite platform assisted by sensors and by a geometric model of the scene

Info

Publication number: WO2015079470A2
Application number: PCT/IT2014/000313
Authority: WO
Inventors: Antonio MACCARIO
Original assignee: Protodesign S.R.L.
Priority date: 2013-11-29
Filing date: 2014-11-26
Publication date: 2015-06-04
Also published as: WO2015079470A3; ITTO20130971A1

Abstract

The described system simultaneously uses an estimation of position and orientation of an image sensor and a geometric model of the scene, made available from a geographic database, to generate a coded flow of bits starting from a video sequence. Camera position and orientation data and geometric model of the scene are used by the video coding system to initialize the movement estimation. In particular, in the ground transmission scenario of video or images from air or satellite platforms, the estimation of camera position and attitude can be made available by the on-board navigation system.

Description

VIDEO CODING SYSTEM FOR IMAGES AND VIDEO FROM AIR OR SATELLITE PLATFORM ASSISTED BY SENSORS AND BY A GEOMETRIC MODEL OF THE SCENE

The present invention, herein below synthetically called "coding system", related to a digital system for compressing "sequences of overlapped photograms" to be used in acquisition scenarios from an "air or space platform" (balloons, air-models, drones, manned or unmanned aircrafts, rockets, satellites, manned orbiting space stations) . It is meant that two photograms are "overlapped" if portions of the image plane of one or the other are projections of objects from the same space coordinates. This condition is rather common in sequences of video photograms.

The sequences of photograms to be coded can be both common video sequences, cadenced by a specific acquisition period, and more generically sequences of images of the same scene obtained from different points of view, also at a time distance variable between a photogram and the following one. This second case has an application interest in air or satellite shootings for ground observation applications, with the specific purpose of producing orthographic images or three-dimensional reconstructions starting from multiple views.

The invention uses an estimation S of the position and orientation of the image sensor, obtained from measures coming from other sensors, and a geometric model of the scene for estimating the correspondences between homologous pixels in the different photograms of the sequence F. Such correspondence FO is commonly called "optical flow" and shows the two-dimensional motion of objects in the image plan. Commonly, commercial video coders show the optical flow through two-dimensional vectors (called "motion vectors" in video coding standards ISO MPEG and ITU H.26X), each one of which is associated with a specific region of the image plan belonging to a partition of the photogram to be coded. The partition and the motion vectors associated to each region are determined through the analysis of the succession of the photograms. In ISO MPEG and ITU H.26X standards, the photograms are divided into blocks with square or rectangular size, with varying sizes op to one pixel. Comparing the photograms to be coded with already coded photograms (thereby available or the decoder) , it is possible to code only the differences, with a consequent advantage in terms of coding efficiency (bit of original data / bit of coded data) .

The main technical advantages of the invention, as claimed in claim 1, with respect to video coders already known in the reference technical-scientific literature and/or already present on the market are: (a) facilitating the estimation process FO of the optical flow, drastically reducing the computing complexity of image processing (ex: estimation of the motion field) by possibly initializing the estimation FO of the optical flow with the one which can be deduced from measure of camera position and orientation; (b) simplifying the coding of the representation of motion of points in the image plan (ex: motion field vectors) by possibly transmitting in the coding flow the camera position and orientation data; (c) the chance, without significantly increasing the computing complexity and without significantly decreasing the coding efficiency, of introducing a more complex representation of motion in the image plan (ex: homographic representations) by possibly partitioning the image plan and associating to each set of pixels a specific transformation; (d) the chance of a more robust representation of motion of the points in the image plan with respect to information loss phenomena on a digital telecommunication channel (ex: loss or corruption of packets), by using for example the estimation FO of the optical flow obtained from position and orientation measures where the decoding of the motion model (ex. Motion vectors) as well as the estimation Sto from the analysis of the sequence of photograms F is not possible; (e) facilitating the partitioning process of the image plan (ex: block partition) , possibly by initializing the same depending on the characteristics of the optical flow FO estimated from movement measures S of the image sensor and on the scene model, without having to analyze the photograms of the sequence F.

The consequences expected from the above technical advantages can be translated into the following performance advantages: (a) having set the computing resources, it is possible to improve the performances of the coding system in terms of compression efficiency (ratio between bits per photogram respectively upstream and downstream of the coding/decoding, with the same reconstruction quality) ; (b) having set the compression efficiency and the digital resolution, it is possible to obtain the decrease of the coding/decoding delay and/or the increase of the maximum amount of photograms which can be coded per second; (c) having set the compression efficiency, it is possible to obtain a better robustness ration with respect to the amount of corrupted or not arrived bits following the transmission on a digital channel; (d) without having to increase the computing resources, it is possible to obtain an efficient coding of sequences of photograms with a lower overlapping in the image plan and/or subjected to more strongly prospective transformations. The currently more meaningful application scenario is the one for coding overlapped photograms from air or satellite platform

(balloons, air-models, drones, manned or unmanned aircrafts, rockets, satellites, manned orbiting space stations) since the geometry G of the scene, in case of shootings of exteriors and above a certain height, is known quite exactly, since it is mainly determined by the surface geometry of the flown-over territory and since the contribution of possible moving objects can be neglected. Air or satellite platforms moreover have position and attitude estimations S, coming from sensors or actuators. Applications aimed to observing the earth are particularly interesting, specifically when the purpose of shootings is generating representations of the territory by means of orthographic photographs or three-dimensional reconstructions. In this cases, the purpose is not having sequences of high time resolution images (such as with video acquisitions) but having instead high space resolution images on the ground. Overlapping is necessary for generating coverages without discontinuities of the flown-over area ("mosaics") or for obtaining dense views of correspondences, namely pixels pertaining to the same space object, for creating 3D reconstructions.

Specifically interesting are unmanned platforms (radio- controls, drones, unmanned aircrafts or UAV - Unmanned Aerial Vehicle, satellites) , which need to send to ground data captured by image sensors. The proposed system can be interesting both for coding high resolution and high quality images affecting the mission, and for the purpose of sending to ground images with lower quality, for their Preview, in order to better define on the ground the acquisitions to be performed with full quality and resolution. Applications of video shootings from platforms with low energy autonomy are also rather important, since they are positively affected by a reduced computation complexity of onboard applications, such as for example those performed by small- sized radio-controlled systems, small UAV, UAV operating at high altitudes or satellites supplied by sun energy. Another possible application is remotely controlling aerial means by pilots on the ground, like the UAV case, in which the low delay between on-board video acquisition and its display on the ground is fundamentally important, above all during take-off and landing and to avoid air obstacles. Moreover, the reliability of the video communication service also in case of disturbed channel is highly important.

It is intended that all enclosed claims are an integral part of the present description. It will be immediately obvious that numerous variations and modifications (for example related to shape, sizes, arrangements and parts with equivalent functionality) can be made to what is described, without departing from the scope of the invention, as appears from the enclosed claims .

The present invention will be better described by some preferred embodiments thereof, provided as a non-limiting example, with reference to the enclosed drawings, in which:

- figure 1 is a schematic block diagram of the system of the invention; and

- figure 2 is a schematic block diagram of the main components of the system of the invention. With reference to the figures, the coding system generally operates as follows. It takes a sequence F of photograms, timed with respect to a common clock ST, a sequence S of position and orientation data of the acquisition sensor of images, also timed with respect to the same clock, a geometric description G of the area flown over by the air platform, returning as output a sequence of suitably coded bit FB so that it is possible to reconstruct therefrom the processed photograms apart from a certain reconstruction error and with a safe advantage in terms of cost of the representation, measured as number of necessary bits to represent each photogram as an average. The invention uses position and orientation data S of the image sensor and the geometric model G of the scene, suitably geo-referred, to estimate with a good approximation the correspondences FO between homologous pixels in the different photograms of the video sequence F. Such estimation FO can be further refined, if necessary, by a direct processing of the photograms F. In such case, the estimation of the optical flow FO obtained starting from the estimation S of position and orientation of the image sensor is an initialization of the estimation FO of the optical flow obtained afterwards through the analysis of the photograms F. In this way, the estimation of the optical flow can be made more accurate .

The invention can be implemented as a physically independent digital system or as a module of a more complex digital system which comprises its functionality. Herein below, a general description is provided for a possible implementation. The "acquisition system" (2 of fig. 1) , which provides the photograms F to be coded, can be composed of a video-camera or of a photo- camera with high acquisition frequency, operating under any band of the electromagnetic spectrum. Sensors operating in electro- optical and/or infrared bands (for example CCD or CMOS sensors) and multi-spectrum sensors are particularly important. The camera body of the image sensor is strictly constrained to a support, which in turn is fixed or moving with respect to the aircraft structure. It is supposed as placed, during its operation, so that the majority of the shot scene is related to the environment outside the aircraft. The estimation S of position and orientation of the image sensor is processed by the "system for estimating S position and orientation" (3 of fig. 1), which collects measures coming from auxiliary sensors, placed on the aircraft or on the camera body, and by possible handling actuators of the camera body. The above data can come from direct position and orientation measures, or can be obtained from connected quantities, for example accelerations, linear and angular speeds. The measuring sensors can be: satellite positioning receivers, accelerometers, gyroscopes, magnetometers, laser or pressure altitude-meters, other possible tools. Data can come, instead of sensors, only or as a complement, from specific handling actuators, which can be gyro-stabilized or not. The module which implements the above system for estimating S position and orientation can be physically independent or can be, completely or even only partially, integrated in the hardware module which implements the coding system. The navigating system of the air or space platform, if present, can be responsible for performing part or all functionalities of the system for estimating S position and orientation. The relative position and orientation between camera body and navigating system can be determined una tantum, if the camera body is fixed, or dynamically if the camera body is handled with a precision actuator. The relative orientation between camera body and image sensor can be determined once and for all through a geometric calibration process performed manually or automatically, possibly through a self-calibration system.

More in detail, in order to be able to determine a correspondence between the points of the 3D scene and the image plan, the acquisition camera must be suitably calibrated, namely its projective model (internal calibration) must be known. Moreover, it is necessary that relative position and orientation are known between camera reference systems and the reference system in which camera position and orientation estimations are computed (external calibration) . Therefore, the coding system will need una tantum t be instructed with setup calibration data, obtained by means of suitable measuring procedures. In order to make the calibration immediate, the coding system can optionally be equipped with a self-calibration system based on the comparison of acquired photograms with corresponding position and attitude data .

The "geographic data base" (4 of fig. 1), from which it is possible to reconstruct the geometry G of the scene, can be composed of a more or less accurate and resolute geometric model, which shows with a certain approximation the thee-dimensional surfaces flown over by the air platform during the acquisition; moreover, the scene can be represented only partly, taking for example into account the orography and instead neglecting the buildings. Further approximations can be performed by the coding system with respect to the original model, to allow reaching a higher computing efficiency. The geometric model must be geo- referred to allow putting in correspondence the points of the image plan with the points in the three-dimensional space. For example, DEM (Digital Elevation Model) or DSM (Digital Surface Model) can be used, obtained previously or simultaneously through air or satellite tele-detection, or still more through field measures (or example through Lidar processing or reconstructions of the Structure From Motion type, based on artificial vision techniques) . The above digital system for storing and indexing geographic data can be a physically independent system or can be a system integrated in the hardware module which implements the coding system.

The sequence of coded bit is sent to a "user system" which stores the photograms F on board the aircraft or sends them on the ground on a transmission channel. The coding system o the invention is structured into the following modules: 1) input interface towards the acquisition system (1' of fig. 2) ; 2) input interface towards the system for estimating S position and orientation (2' of fig. 2) ; 3) communication interface with the geographic data base (3' of fig. 2); 4) output interface towards the user system (4' of fig. 2); 5) interface with the system clock (5' of fig. 2) ; 6) estimation subsystem FO of the optical flow (6' of fig.2 ); 7) coding engine (7' of fig. 2). The invention has, as first input, a digital or analogue interface (1' of fig. 2) through which, upon every acquisition cycle, the generic photogram of the video sequence is delivered by the "acquisition system" (2 of fig. 1) to the coding system (1 of fig. 1) . A second input of the system is composed of a digital interface (2' of fig. 2) which receives information related to the estimation S of the position and orientation of the camera body with respect to a reference which is inertial, or anyway can be approximated as such. Such information are provided by the already mentioned "system for estimating S position and orientation" (3 of fig. 1) . The third input of the system is composed of a digital interface (3' of fig. 2), connected to a memory which contains a "geographic data base" (4 of fig. 1) , in which a representation of the geometric shaping of the flow-over ground is stored and indexed. The system output is a digital interface (4' of fig. 2) which exposes the flow of bits FB coding the video sequence in a compressed format, and is possibly directed to the "user system" (5 of fig. 1) , which can be, for example, a telecommunication system for sending data to ground or to a recording system for storing on-board data. Both the photograms F and the position and orientation data S must be suitably timed with respect to a common reference. It can possible be represented by a synchronization signal ST generated by an external device and routed both to the acquisition devices and to the coding system through a suitable analogue input interface (5' of fig. 2) . Alternatively, each data packet representing an image or a measure vector can be equipped with a data field specifically dedicated to the digital representation of the acquisition instant, though the acquisition occurs asynchronously.

The term "optical flow FO" herein below means any injective correspondence between the pixel of the photogram to be coded and the reference one, in any way such correspondence is coded (for example, displacement vectors, collection of homographies, deformation of triangular meshes) .

The subsystem for estimating FO the optical flow (6' of fig. 2) has the purpose of performing the processing of position and orientation data of the camera body and of the geometry G of the scene to obtain a list of correspondences between the pixel of the current photogram and those of a reference photogram. Processing of the geometry G of the scene can also imply a suitable simplification with the purpose of improving the computing performances. The reference photograms can also be different for each portion of the photogram to be coded. In order to more accurately determine the above correspondences, downstream or together with the processing of geo-referred geometric data, processing of the photograms F can also be provided. In practice the optical flow FO estimated starting from position and orientation data can be deemed as a first approximation of the one processed starting from the analysis of the photograms F. Specifically, the above described case is the most relevant one from the application point of view.

The subsystem for estimating FO the optical flow can be implemented on a specific hardware or can share the resources with the other subsystems of the invention. Such subsystem can be implemented on a general purpose microprocessor card, "systems on chip", systems with programmable electronics, digital signal processors, graphic processors. The previously mentioned systems must all be equipped with memories for data and programs and suitable data transmission systems for interfacing with the other subsystems. The use of graphic processors is of particular interest .

The video coding engine (V of fig. 2), namely the completion of the coder with respect to the subsystem for estimating FO the optical flow, can be composed of a suitably designed system but also of a commercial coding system. The software which implements the coding engine must operate on a specific processing hardware (general purpose microprocessor cards, "systems on chip", systems with programmable electronics, digital signal processors, graphic processors) , equipped with memories and of interfacing and data transmitting systems, which can be shared or not with the other subsystems .

Solutions compatible with international video coding standards are of particular interest. In particular, by using for the geographic data base the terrestrial geoid model and approximating the scene con plane surfaces, it is possible to make a system compatible con the ISO MPEG standard. In such case, it is possible to use the tools for Global Motion Compensation and for Sprite Coding, after having grouped the pixel of the generic photogram into Video Objects, one for each plane of the scene. For more complex geometric models, instead, use can be made of instruments made available by the ITU H standard, referred to MVC (Multi-view Video Coding) specifications. The novelty introduced by the invention is mainly composed of the chance of being able to estimate the optical flow FO starting from information about the point of view of the sensor S (position and orientation in space) and, not less important, from a geo- referred geometric model G of the observed scene. The second type of information, extremely important from the point of view of obtained benefits, is available in case of sequences of images acquired from air or satellite platforms, while, with the current state of the technology, is not available in other application contexts . The techniques of estimating the optical flow FO starting from camera position and orientation data S and from a geometric model of the scene G, can be multiple and are not an inventive novelty. The same can be stated for the modes with which the representation of the optical flow FO can be efficiently coded. Assuming to have a sufficiently adequate estimation S of position and orientation of the image sensor, it is possible to estimate the optical flow FO directly from geometric considerations and without analyzing the photograms F. For example, the flown-over surface can be represented by a set of polygons with adjacent sides. Each polygon is part of a plane with known equations in the space. The projection of a polygon on the image plan is generally still a polygon. Depending on these considerations, it is possible to associate to each pixel of a photogram a specific homographic transformation which defines its position in another photogram of the sequence.

Instead of reasoning by pixel, it is possible to reason by groups of pixels, if one deems of having to partition the image plan according to a defined geometry (for example in squared or rectangular blocks, as provided by the ITU H.264 standard and by the ITU H.265 standard).

Processing of images F can be performed afterwards, if necessary, only for refining the estimation FO of the motion vectors, with foreseen computation enlightening. Since camera position and orientation S are approximately known and since the geometry G of the acquired scene is approximately known, it is possible to optimize the partitioning process of the image plan, associating to each partition: (a) a certain reference photogram, which, for similarity reasons, is more suitable to represent the group of pixels to be coded by differences; (b) the specific transformation to be applied for mapping the pixel of the group with those of the reference photogram; (c) the structure of the partition and the sizes of each group of pixels depending on the specific coordinates in the image plan and on its correspondence with the observed three-dimensional scene.

In order to make such decisions, it is possible to use the techniques currently implemented by specific coding systems with open source code, replacing the considerations performed depending on the analysis of the photograms, considerations about the regularity of the observed surface (particularly low gradient values, for example, suggest the selection of wider regions, while higher values suggest smaller regions) . The original representation of the geographic data can be manipulated in order to change its characteristics, for example performing possible approximations, with the purpose of reaching a higher computing efficiency in the estimation FO of the optical flow.

The determination of the correspondence between homologous pixels belonging to different photograms and the coding only of the differences (movement estimation and composition strategy) is the base for all the most efficient video coding techniques, among which the international ISO-MPEG and ITU-H.26X standards. Specifically, in the above standards, the single photogram is partitioned into blocks. Each block is compared with the pixel of one or more previously coded reference images, in order to maximize the correspondences. Usually the search area is limited (there is a "search window") and the block is simply translated into the image plan of the reference photogram in order to determine the best translation value for the corresponding "motion vector". Each motion vector is send to the decoder, while only the difference of the block pixel with respect to the reference photogram is coded, all this with a great advantage in terms of coding efficiency.

The representation of the optical flow FO can be presented by the responsible subsystem in the forms compatible with the above coding standards or with other future ones. In particular, the optical flow FO can be represented with motion vectors or with global motion vectors, as provided by the ISO MPEG standard, part 2 and part 10. If the representation of the optical flow FO has to be subjected to compatibility constraints with the video coding standard or with pre-existing decoders, the implementation will have to be such that the representation syntax is able to be decoded by the decoder towards which the compatibility has to b kept. In such case, the global functionalities and performances could be reduced with respect to the full functionalities of the proposed invention. The proposed invention consists in an innovative video coding system in acquisition scenarios from air or satellite platform. The current systems used in such applications are not different from the systems used for generic applications for information technology and for consumer electronic (television, video- telephony, streaming on data network, reproduction on digital support) . In case of sequences of photograms with low overlapping, for which high quality and resolution are often required, the coding strategies generally are the same of fixed images and do not exploit estimation and movement compensation, with a strong loss of efficiency. By the way, the coding systems which use estimation and movement compensation are currently designed only for video sequences (therefore, with high overlapping of the following photograms) and not for sequences of photograms with low overlapping . In the video sequences the motion representation can be efficiently performed with a partition into square or rectangular blocks, and with movement vectors associated with simple translations (ISO MPEG and ITU H.26X standards) . The above model is not suitable in case of great displacements of the point of view, wherein relevant perspective variations can occur. Finally, digital coding systems are not devised for transmission systems with low delay. For such type of transmissions, in fact, analogue connections are currently used on telecommunication channels, which generally have a lower global delay between on-board acquisition and ground display.

One of the main computing burdens of a video coding system, as regards in particular the solutions based on standards (ISO MPEG, ITU H.26X), resides in estimation operations of the optical flow, which are as complex as greater is the relative motion between image sensor and scene. This is the case for sequences of images with low overlapping.

In the video acquisition scenario from air or satellite platform, motion and orientation of the acquisition sensor can be estimated due to the use of the on-board instruments; the geometry of the scene actually is known, having available a digital model of the ground, since, if the shooting occurs from a sufficient height, it is governed by the surface geometry of the flown-over territory (for example orography) . The estimation of the optical flow FO starting from camera position and orientation S, using a geometric model of the scene G, does not require the analysis of the sequence of photograms F, apart from possibly correcting the residual inaccuracy of the sensors and of the model. In such case, the computational analysis complexity is presumably drastically reduced, since the position of the corresponding pixels in the reference photogram is approximately known. With reference to ISO MPEG and ITU H.26X standards, a reduction of the "search window" sizes is expected. For what has been stated above, with respect to the case in which the optical flow FO is estimated only with image processing techniques, with the same computing complexity, a performance improvement is expected in terms of coding efficiency and of processing delay. In the same way, with the same computation complexity, a coding efficiency improvement is expected.

If required, the representation of the optical flow FO can be made compatible with the specifications of international ISO MPEG standard, part 2 and part 10, so that the complete coding system deriving therefrom is compatible with the standard. The transmission to ground of position and orientation data (S) , being available for the decoder the geometric model of the scene (G) used when coding, it can be exploited for recovering photograms lost when transmitting or for decreasing the decoding delay. The invention can in fact be used in order to improve the robustness of the decoding system in case of loss of packets on the communication channel. Assuming that pose data (camera position and orientation) are not lost and the reference photogram is available, starting from which the movement compensation is performed, it is possible to approximate the optical flow with the one which can be estimated starting from the pose, though the current photogram is not available. Also when even position and orientation data are lost, it is possible to use in place thereof a forecast of the same obtained through suitable predictive filtering techniques (for example the well-known Kalman filtering) .

The invention, on the other hand, can be exploited in order to lower the delay of the coding-decoding chain in the following way. The decoder does not wait for the reception of video packets which code the current photogram, but uses current position and orientation data or those obtained from a predictive filtering to estimate the current photogram starting from the already decoded photograms. The coding of the current photogram is therefore used by the decoder only for decoding the following photograms.

Claims

Coding system (1) of videos or of sequences of overlapped images adapted to simultaneously use an estimation of position and orientation of an image sensor, obtained from external sensors or actuators, and a 3D geometric model of the scene, present in a database (4) or contextually determined, for estimating an optical flow (FO) between due photograms (F) through an acquisition system (2) to which said video coding system (1) is operatively connected, said optical flow (FO) designating a generic injective function which associates to a pixel of the image to be coded its homologous in a reference image, in any way it is represented, said coding system (1) being adapted to take into account position and orientation data of a photo- or video-camera through a position and orientation estimating system (3) , the geometric model of the scene and the photograms for producing a coding bit flow (FB) for a video, characterized in that it is adapted to jointly use an estimation (S) of position and orientation of an image sensor, obtained by processing data coming from external sensors or actuators, and a 3D geometric model of the scene (G) , present in a geographic database (4) or contextually estimated, such model (G) being obtained starting from processing data coming from different sensors from the imaging sensor used for acquiring the photograms to be coded, said system generating a coding bit flow (FB) starting from a sequence of images (F) captured by an interconnected acquisition system (2), exploiting said position and orientation estimations (S) and said geometric model of the scene (G) , for estimating the optical flow (FO) in order to perform the movement compensation, said optical flow (FO) pointing out a generic injective function which associated to a pixel of the image to be coded its homologous in an already coded reference image, in any way it is represented.

2. Coding system (1) of videos or of sequences of overlapped images according to claim 1, characterized in that the video is adapted to be transmitted to ground, and said system (1) further comprises a coding engine (1') adapted to perform a completion of the coder with respect to the subsystem for estimating the optical flow (FO) .

3. Coding system (1) of videos or of sequences of overlapped images according to claim lor, characterized in that said position and orientation estimating system (3) is totally or partly implemented through a navigation system of an aircraft or satellite on which said system (1) is installed, in case of video-camera suitable to be oriented, in order to determine position and orientation of the image sensor, relative or absolute photo- or video-camera system present on board being used.

4. Coding system (1) of videos or of sequences of overlapped images according to claim 1, characterized in that said acquisition system (2) of photograms (F) is a photo-camera, or a video-camera, said acquisition sensor operating in any electro-magnetic band, in particular the sensor operating in the electro-optical and/or infrared band or being multi- spectral, the sequence of photograms object of coding being therefore a video sequence acquired with a constant sampling period, or a sequence of images acquired without any fixed periodicity, in both cases the coding system employing the overlapping in the image plan of the acquired photograms, which gives rise to a redundancy.

5. Coding system (1) of videos or of sequences of overlapped images according to claim 1, characterized in that said geographic data base (4) is placed in an external module or even inside the physical module in which the system (1) is housed.

6. Coding system (1) of videos or of sequences of overlapped images according to claim 1, characterized in that the original representation of the geographic data is adapted to be manipulated by the coding system (1) in order to change its characteristics, for example performing possible approximations, with the purpose of reaching a greater computational efficiency, the 3D geometric model of the scene used for estimating the optical flow coinciding or being an approximation of the original model (G) stored in the data base or extemporaneously computed by the dedicated sensors, the approximation occurring through the reduction of the ground resolution, if the model of the scene is represented through a matrix of altitudes, the reduction of the vertexes, if the model is represented with a structure with triangular meshes or through the reduction of the number of points, if the model is represented with a cloud of points.

7. Coding system (1) of videos or of sequences of overlapped images according to claim 1, characterized in that said geographic data base (4) is pre-existing or is generated dynamically, possibly in real time, by an external sensor different from the acquisition sensor, the 3D geometric model (G) being approximated or partial with respect to the whole scene .

8. Coding system (1) of videos or of sequences of overlapped images according to claim 1, characterized in that position and orientation data are used on the ground, to display with a shorter delay the video sequence transmitted by an air platform, through an artificial interpolation of the already decoded photograms, the above interpolation being performed by estimating the optical flow starting from position and orientation data (S) of the acquisition camera and the 3D geometric model (G) of the observed scene, to make such option possible the system architecture being such that the position and orientation data (S) are suitable coded in the coding bit flow (FB) and the geographic database (4) is available for the decoder.