GB2584027A - Method and system for creating images - Google Patents

Method and system for creating images Download PDF

Info

Publication number
GB2584027A
GB2584027A GB2011002.9A GB202011002A GB2584027A GB 2584027 A GB2584027 A GB 2584027A GB 202011002 A GB202011002 A GB 202011002A GB 2584027 A GB2584027 A GB 2584027A
Authority
GB
United Kingdom
Prior art keywords
image
images
tile
camera
ground plane
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB2011002.9A
Other versions
GB202011002D0 (en
Inventor
Mark Remde Stephen
Tanathong Supannee
Alfred Peter Smith William
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gaist Solutions Ltd
Original Assignee
Gaist Solutions Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gaist Solutions Ltd filed Critical Gaist Solutions Ltd
Publication of GB202011002D0 publication Critical patent/GB202011002D0/en
Publication of GB2584027A publication Critical patent/GB2584027A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N25/00Circuitry of solid-state image sensors [SSIS]; Control thereof
    • H04N25/60Noise processing, e.g. detecting, correcting, reducing or removing noise
    • H04N25/61Noise processing, e.g. detecting, correcting, reducing or removing noise the noise originating only from the lens unit, e.g. flare, shading, vignetting or "cos4"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

Generating an orthomosaic image of a geographical survey area includes the following steps : a. recording a set of still images of the survey area using a land-borne camera, in which the camera has an optical axis that is angled downwardly with respect to a surface of the survey area and recording a geographical position corresponding to each image; b. comparing pairs of partially overlapping images to identify common features in the image to derive a model of the motion between the images; c. calculating a pose in respect of each image describing where the camera was in world coordinates and where it was looking when the image was captured; d. projecting each image onto a ground plane to create a set of image tiles; e. generating for each orthomosaic tile to be stitched, an associated set of images whose projections overlap the tile to be stitched and f. stitching together a plurality of image tiles to create an orthostatic image corresponding to an area that is present in a plurality of the images recorded in step a. The associated set of images may be derived by computing the projection of the corners of each image and performing an inside-polygon test for each tile.

Description

Method and System for Creating Images This invention relates to a method and system for creating images. In particular, it relates to a system capable of building virtual top-down view images of very large areas, for example, of road networks simply by driving a vehicle around the road network and collecting images with a camera oriented towards the road surface.
The term "mosaic" means an image that is constructed from a number of overlapping images.
The type of images created by a system embodying the invention are referred to as "orthomosaics", reflecting the fact that they are a mosaic of many images (perhaps millions) and that they approximate an orthographic view of an area being surveyed from above, similar to an image that might be captured by a satellite.
There is a demand for high-resolution orthogonal imagery of large areas of terrain, typically road surfaces, to enable surveying and assessment for example to enable maintenance to be scheduled. Such imagery can be obtained from satellites, but this is a very costly operation, and requires favourable atmospheric conditions if sufficient detail is to be resolved.
An aim of this invention is to produce orthomosaic images using a ground-based system at a cost that is significantly less and in a more versatile manner than is the case for satellite imaging.
To this end, this invention provides a method of generating an orthomosaic image of a geographical survey area as set forth in claim 1.
This method computes camera poses without ever estimating 3D positions of feature points. This makes the dimensionality of the problem much lower than conventional structure-from-motion analysis, meaning it can be solved more quickly allow it to scale to much larger sets of images. This is made possible by assuming that the surface being imaged is approximately planar.
Prior to step b., a search may be performed to identify a set of pairs of images that are likely to contain overlapping regions for comparison in step b. The search may include use of geometric heuristics to identify pairs of images that are likely to contain overlapping regions. This means that the content of the images need not be compared at this stage.
In step b., motion between pairs of images may be modelled using a homography that is a transformation valid when two images view the same planar scene from different positions.
Step b., may include simultaneously finding feature matches between pairs of images that are consistent with a homography and computing the homography that is consistent with these matches.
Step b. may include a step of optimising the estimates of camera poses created in step b., by identifying poses that would give homographies between views that are consistent with those estimated from the image pairs in step b.
Advantageously, in step b., images are projected onto a ground plane prior to comparison. In such cases, it may be assumed that a pair of images differ only by a 213 rotation and/or a translation.
Estimated homographies are, most advantageously, filtered to discard those likely to be a poor model of the transformation between two views. This allows unreliable transitions to be removed from the process of optimisation, so avoiding corruption of the motion estimation in step c. A homography can be deemed to be "poor" using several alternative heuristic tests. For example, a homography may be deemed to be poor if the number of common features identified in step b., is below a threshold. A homography may be deemed to be poor if, when applied to a first of an image pair, the number of common features identified in step b., that it maps erroneously onto a second of an image pair is above a threshold.
In step b., the method may include a step of optimising the estimated pose of the camera, by identifying poses that would give homographies between views that are consistent with those estimated from the image pairs in step b.
An objective function may advantageously be optimised in step c., to ensure that the estimated poses give homographies that are consistent with those estimated between overlapping images. To create the objective function, for each of a pair of images, a homography is computed from the image to the ground plane, and this homography is applied to the 2D positions of matched image features to yield matched features on the ground plane, wherein the distance between pairs of matched features are residuals that are added to the objective function such that the square of these residuals is minimised. These steps directly optimise how well the images will align when they are projected to the ground plane. This is entirely appropriate since production of a seamless mosaic is an aim of the invention.
Other residual errors in the camera pose are included as appropriate. For example, a pose may be deemed to be poor if it deviates from an initial estimate derived from a recorded geographical position by more than a threshold, and a pose is deemed to be poor if it implies non-smooth motion. Other tests may be used and tests may be applied individually or in combination.
In step c.. one or more prior assumptions maybe made about the pose as part of the calculation in step c. These may include one or more of a constant camera roll angle, a constant camera height, a constant camera pitch angle and a ground control point of known position. Alternatively or additionally, the prior assumptions include a GPS position.
This can be achieved by computing the projection of the corners of each image and then performing an inside-polygon test for each tile. For each pixel in a tile there may be applied a perspective projection operation into each image, giving a non-integer 213 image coordinate. The nonlinear distortion induced by the camera lens may then be applied to the 213 position to giving the final image coordinate. Hence, for each image there is produced a set of 213 coordinates corresponding to each pixel in the ground plane tile. Linear interpolation may then be used to compute the colour at each point in an image to be stitched to give a reprojection of the image to the ground plane.
The final step in the pipeline is to combine the information from the ground plane projections of the images into a single, seamless orthomosaic. If the method were to compute a single mosaiced image covering the whole surveyed area, the problem would be intractable.
Moreover, to enable viewing of the images over a slow network, the mosaic must be broken up into tiles covering different areas at different zoom levels. For this reason, stitching is most advantageously performed for tiles independently.
In step e., a weighting function that chooses the "best" gradient for each tile pixel from each image is typically applied to create the stitched image. The weighting scheme means that the gradient is taken from the image that observed that point at the highest resolution. This encourages the inclusion of maximum detail in the orthomosaic. Additionally or alternatively, to the weighting function, a mask may be applied to the image, the mask indicating pixels of the image that are to be excluded from the mosaic. The mask can be applied to remove artefacts from the image or other environmental objects that are not of interest The mask may be a static mask is applied to each recorded image, the static mask being the same for each image to which it is applied. This may be used to exclude static artefacts such as visible parts of a vehicle. Alternatively or additionally a dynamic mask is applied to at least some recorded images, the dynamic mask varying from one image to another. This may be used to exclude changing artefacts such as shadows Advantageously, the stitched image may be computed by solving a sparse system of linear equations. Such systems can be solved efficiently meaning the tile stitching process could potentially be done in real time, on demand as requests for tiles are made from the viewing interface. The selected gradients typically provide targets that the stitched image must try to match and guide intensities that the stitched image must also try to the match during the stitching process. The guide may be the average of the projected images.
To avoid seams at the boundaries of tiles, in step e., a check may be made as to whether any adjacent tiles have already been stitched, and if they have, a small overlap into the existing tiles is added to the new tile. In the overlap region, there may be only the guide intensity constraint which is weighted very highly in the objective function and a smooth transition to a lower weighted guide constraint for the interior of the tile. The result is that intensities at tile boundaries match exactly and any large intensity changes that would have occurred are made less visible as low frequency transitions. This is a beneficial step that tiles to be stitched one at a time but still produce a seamless mosaic when adjacent tiles are viewed next to each other.
In order to optimise and accelerate the estimation of poses in step a., an initial estimate of a pose may be derived from the recorded GPS position. The initial estimate typically includes the direction of the optical axis of the camera (yaw) being determined by the recorded GPS direction of orientation, the viewing angle of the imaged surface (pitch) is as determined during an initial calibration of the system, and that transverse rotation (roll) is zero.
Step a., is typically carried out on a land vehicle such as a road vehicle, such as a motor road vehicle (a van, or similar) or a human-powered vehicle, such as a bicycle. Steps b. to e. are typically carried out on a programmed computer.
From a second aspect, this invention provides a method of image processing comprising performing steps b. to e. according to any preceding claim.
From a third aspect, this invention provides computing apparatus programmed to perform steps b. to e. according to any preceding claim.
An embodiment of the invention will now be described in detail, by way of example, with reference to the accompanying drawings, in which: Figure 1 shows, diagrammatically, a vehicle equipped with a camera that is suitable for use with a system embodying the invention; Figure 2 is a schematic illustration of a system to generate orthomosaic map in an embodiment of the present invention; Figure 3 is high-level process flow diagram showing steps used to produce an orthomosaic for a tile; Figure 4 illustrates projecting an image captured from a capturing device to the ground plane presented as tile grid with respect to the 213 Cartesian ground/reference coordinate system; Figure 5 shows steps in matching features bctwccn a pair of images in an embodiment of the invention; Figure 6 illustrates multiple projected images overlapping on the same tile; Figure 7 illustrates a Map Tile Index data file; Figure 8 is an illustration of the creation of ground plane image tile from a source image in a reverse order via a reverse remapping function; Figure 9 is a process flow diagram showing steps to associate 2D ground coordinates in tile with pixel coordinates of the source image in order to create a ground plane image in a reverse order; Figure 10 is an illustration of assigning weight coefficient to pixels in the captured image, in which the weight assigned is inversely proportional to the distance of the pixel to the camera; Figure 11 is a process flow diagram showing steps to construct the inputs from the captured image in tile for the linear equation system to generate an orthomosaic tile; Figure 12 is an illustration of an image of guide intensity as part of generating orthomosaic tile, in which the size of the guide image is enlarged such that the guide image includes the intensity of the neighbour orthomosaic tile at the overlap regions to avoid seams at the boundaries of the tile; Figure 13 is an illustration of determining weight values when there exist stitched tiles at the boundaries of the current tile, in which the tile size is extended from n x 17 to m x in such that it includes the overlap regions with its neighbour tiles; and Figure 14 illustrates making of images to exclude artefacts.
Introduction
A vehicle 10 for use with the present invention includes a camera 12 that is mounted on a high part of the vehicle pointing generally forward and angled downward. The vehicle 10 is also equipped with global navigation satellite system apparatus 14 typically, using the Global Positioning System (GPS). As the vehicle is driven forward, the camera captures a sequence of images of the terrain immediately in front of the vehicle. These images are stored in a data capture and storage system 16 and the geographical location of each image, as determined by the global navigation satellite system, is recorded.
An initial step is to perform a camera calibration in order to compute an intrinsic calibration matrix K: 0 0 1 where fx. andf, are the focal lengths in the x andy direction and cy and cy define the centre of projection. This calibration matrix describes the specific properties of how the camera projects the 3D world into a 2D image. In addition, a calibration is made for distortion parameters that model how the camera optics deviate from a simple pinhole camera model. These distortion parameters can be used for a nonlinear adjustment to images or image locations to remove the effect of these distortions.
Image Processing Data from the data storage device are transferred to a computer for processing in order to generate an orthomosaic image. The computer is programmed to process the data in a manner that will now be described.
Motion-from-homographies The first stage of the processing applied to the image sequence is to compute an accurate pose (orientation and position) for the camera in every captured image. The pose needs to be sufficiently accurate that, when images are later projected to the ground plane, overlapping images have at least pixel-accurate alignment. Otherwise, there will be misalignment artefacts in the generated orthomosaic images.
Although this is a structure-from-motion (SFM) problem. However, previous approaches for SFM are not applicable in this setting for two reasons: * Images in which the scene is primarily the road surface are largely planar. This is a degenerate case for estimating a fundamental matrix and subsequently reconstructing 3D scene points. K= (1)
* Typically, the number of 3D scene points matched between images and reconstructed by the SFM process is much larger than the number of pose parameters to be estimated. This means that SFM does not scale well to very large problems, e.g. those containing millions of images.
Similarly, methods based on Simultaneous Localisation and Mapping (SLAM) are not applicable. They require high frame rates in order to robustly track feature points over time. Sampling images at this frequency is simply not feasible when the aim is to build orthomosaics of thousands of kilometres of road.
Therefore, this embodiment provides an alternative that addresses the drawbacks of using SFM or SLAM for our problem. This approach will be called "motion-from-homographies".
Motion-from-homographies assumes that the scene being viewed is locally planar. This allows it to relate images that are close together in the sequence by a planar homography. The approach exploits the temporal constraints that arise from knowing the images in a trace come from an ordered motion sequence by only finding feature matches between pairs of images that are close together in the sequence. Therefore, it is only necessary to compute pairwise matches between image features and not reconstruct the 3D world position of image features. This vastly reduces the complexity of the optimisation problem that to be solved compared with known approaches. The number of unknowns is simply 6N for an image sequence of N images.
Motion Model The pose of the camera is represented by a rotation matrix and a translation vector. The pose associated with the ith captured image in a sequence is 11.1 and ti. The position in world coordinates of a camera can be computed from its pose as Ci = . A world point is represented by coordinates W=[1/ V WIT, where (u, v) is a 2D UTM coordinate representing position on the ground plane and w is altitude above sea level. Each camera has a standard right handed coordinate system with the optical axis aligned with the w axis. A world point in the coordinate system of the ith camera is given by: w, = R,w+t, (2) The rotation is a composition of a number of rotation matrices. The first aligns the world w axis with the optical axis ofthe camera: 1 0 0 0 0 -1 0 1 0 The system models vehicle orientation by three angles. This representation is chosen because the vehicle motion model leads to constraints that can be expressed very simply in terms of these angles. Hence, three rotation matrices in terms of these angles are defined: = (3) (a) = cos(a) 0 sin(a) (4) 0 1 0 _-sin(a) 0 cos(a) Rpitch(A= 1 0 0 (5) 0 cos(fl) -sin(fl) 0 sin(P) cos(fl) cos(7) -sin(y) 0 sin(7) cos(7) 0 0 0 1 The overall rotation as a function of these three angles is given by: R(a, )8, 7) =12,oll (7)Rplich (MR,. (a)12,42c (7) Hence, the rotation of the ith camera depends upon the estimate of the three angles for that camera: (6) =Rat, lip7,) (8) Initialisation The system relies on GPS and an initial estimate of the camera height above the road surface to initialise the location of each camera:
GPS
GPS V.
wincasmcd where (U,(liPS,12,GPS) is the GPS estimate of the ground plane position of the ith camera and measured is the measured height of the camera above the road surface in metres. This need only be a rough estimate as the value is subsequently refined.
To initialise rotation, the yaw angle from the GPS bearing is computed first A first step in doing this is to compute a bearing vector using a central difference approximation: b, = 0 5 [ U
V -V CPS-CPS
CPS GP S * (10) Second) this is converted into a yaw angle estimate: = atan2(-b1 2,2)' The pitch is initialised to a measured value for the angle between the camera optical axis and the road surface and the roll to zero: )6,11lit _
-
mistmed Again, red, only need be roughly estimated since it is later refined. (9) (12) (13)
Fast geometric heuristic techniques are used to identify pairs of images that may contain overlapping regions of the ground plane. Two different procedures are used in this embodiment to identify potential overlaps between: 1. images from the same trace (which will be referred to as "self-overlaps") and 2. images from different traces (which will be referred to as "between-trace overlaps"). The goal of this part of the image processing pipeline is to reduce the number of image pairs for which feature matching and match filtering must be performed.
Selloverlaps. Self-overlaps are detected according to two heuristics. First, the images come from a sequence. Moreover, if image capture is triggered using GPS and wheel tick odometry information, there is an approximately fixed distance between consecutive images. This means that it is possible to choose a constant offset 0 e YT within which images can be expected to overlap: that is, an image i can be expected to contain overlaps with images in the range i -0 to I + 0. This is referred to this as a sequence heuristic and defines the set of such potentially overlapping pairs for trace t as: 0:," = '(1,1)11, j N., A 0 < j - ()I-, (14) and hence I Os',I= N -0(0 +1) / 2.
Second, since the trajectory of the capture vehicle may intersect with itself or even retrace previously covered ground, it is preferable to detect potential overlaps between images that are not close in the image sequence. This is done by comparing the ground plane projection of image centres. The distance between projected image centres is given by: deentwid 0, - H T ' C -HT' cx (15) 1 C Where H, is the homography that transforms from the ground plane to the ith image computed according to (21). The homography depends upon camera pose estimates which have not yet been optimised. Hence, for this test the homography is constructed using the initialisation from GPS location and bearing The search aims to find all pairs where the Euclidean distance between centroids is less than a threshold: qcntroid -(I, .i) dcontroici.j) < I) (16) This is a 2D fixed-radius all nearest neighbours search, which can be computed in Qcntroid) time.
The set of potentially self-overlapping image pairs is then given by: (9:eff (91 LJ (17) sal centruid * Between-trace overlaps: It is also useful to find potential overlaps between images from different traces. At this point in the process, performed pose optimisation on individual traces has already been performed, so pose estimates (121j, t11) are available for every image in every trace If trace s and t contain potentially overlapping images, then a set(9:e'n% tell, s <1 is stored, such that(i, j)E (.9,ween if image i. in trace s is a potential overlap for image] in trace t. To compute this set, a sequence of tests of increasing complexity is performed, that so that the number of images considered by the more expensive and accurate tests is kept to a minimum.
To find candidate pairs from the entire image set, to begin with, images are spatially binned into tiles. Image corners are projected onto the ground plane to form a 4-sided polygon. An image is binned into a tile if the polygon overlaps within the tile boundaries. These tile-toimage maps are later used during mosaic stitching and the details of the binning process is described below. This operation is linear in the number of images. Any tile that contains images from more than one trace could contain images with between-trace overlaps.
For all pairs of images from different traces that are binned to the same tile, there is now performed a test for intersections between the bounding boxes of the ground plane projections of the images. If the bounding boxes overlap, a point-in-polygon test is performed. Two images are considered potentially overlapping if: 1. the centroid of one polygon is inside the other, or 2. at least three corners of one polygon are inside the other, or 3. two corners are inside the other and those are not adjacent, or 4. at least one corner is inside the other and the area of overlap is greater than a threshold.
For the case 4, the area of overlap must be calculated. First, a polygon is formed from the corners inside the other polygon and the intersection points with the other polygon's edges. Then the area is computed using Meister's shoelace formula.
Feature Matching and Filtering Having found pairs of images that potentially contain overlapping regions of the ground plane, the second step in the pipeline is to compute feature matches between images. This process is made to be robust by filtering the matches according to a restricted motion model. Any pair of images with an insufficient number of inlying matches is discarded. The feature matches between all other pairs are retained and used later in the pose optimisation process. Unlike in a conventional 3D structure-from-motion pipeline, this method computes only pairwise matches between image features. This is because only pose estimates are required so it does not reconstruct the 3D position of matched features. This vastly reduces the complexity of the pose optimisation process while being sufficient for the final goal of orthomosaic stitching.
The part of the scene which is of interest (the road surface) can be assumed to be locally planar.
This assumption can be exploited during feature matching in two ways. First, features are extracted in images that have been approximately projected to the ground plane. This has the effect of normalising the appearance of features that lie on the ground plane (by approximately undoing the effect of perspective distortion) and improving the likelihood of matching the same point viewed from different locations. It also has the added benefit of distorting the appearance of features that do not lie on the road plane (such as buildings, signage or vehicles) so that they are less likely to be matched. Second, a restricted motion model is assumed with only three degrees of freedom which means that feature matches can filtered efficiently and robustly.
The matched features could be any feature that is repeatably detectable and that has a distinctive descriptor. For example, the "scale-invariant feature transform" algorithm as described in US-A-6 711 293 may be used. The 2D location of each feature is undistorted using the distortion parameters obtained during calibration. The system then computes greedy matches between features in pairs of images that are within the overlap threshold. These matches are filtered for distinctiveness using Lowe's ratio test with a threshold of 0.6. This means that the distance in feature space to the first closest match must be no more than 0.6 of the distance to the second closest match. Even with this filter applied, the matches will still contain noise that will disrupt the alignment process. Specifically, they may include matches between features that do not lie on the road plane (such as buildings or signage) or between dynamically moving objects (such as other vehicles). If such matches were retained, they would introduce significant noise into the pose refinement process. Therefore, these matches are removed by enforcing a constraint that is consistent with a planar scene (the homography model) and further restricting motion to a model with only three degrees of freedom.
Since it is assumed that the scene is locally planar, feature matches can be described by a homography. In other words, if a feature with 2D image position X E 2 in image i is matched to a feature with image position X' E j 2 in image j, then there is expected to exist a 3x3 matrix H that satisfies: [ 1_ s (18) [ 1 where s is an arbitrary scale. This homography constraint is used to filter the feature matches.
However, for the purposes of filtering, a stricter motion model is assumed than elsewhere in the process. Specifically, the assumption is made that the vehicle has two degrees of freedom to move in the ground plane and that its yaw angle may change arbitrarily. However, pitch and the height of the camera above the ground plane are assumed to be fixed to their measured values and that roll is zero. This allows parameterised a homography by only three parameters and enables matches to be ignored that would otherwise be consistent with but which would lead to incorrect motion estimates. For example, if a planar surface (such as the side of a lorry) is visible in a pair of images, then features matches between the two planes would be consistent with a homography model. However, they would not be consistent with the stricter motion model and will therefore be removed.
Under this motion model, a homography can be constructed between a pair of images H(u,v.a) based on the 213 displacement in the ground plane (u. ) and the change in the yaw angle U. This homography is constructed as follows. First place the first image in a canonical pose: = R(0, gmed, 0) (19) 0 -t, = 0 measured (20) The homography from the ground plane to this first image is given by: H, = (21) We define the pose of the second image relative to the first image as: R, (a) = Rta, , (22) ii t", (ii, v) = V (23) w measured Therefore, the homography from the ground plane to the second image is parameterised by the ground plane displacement and change in yaw angle: 1-1,(i,v,a)=K[(122(a)) 1.2 t2(u, v)] (24) Finally, the homography can be defined from the first image to the second image as: 111->2 v, = H, (n, v, a)Hr (25) Local features in the ground plane images are computed. Let uj denote the location in the ground plane image of feature f Also computed and stored is xf, by projecting u1 back into the image. (Note that this location is not distorted since it will later be projected back to the ground plane during pose optimisation). The if are the undistorted locations of features in the image plane.
Features in the projected ground plane images can be expected to approximately retain their appearance and scale but the orientations of the same features in two different images will differ by a global 2D rotation. Hence, invariance to affine distortion is not required but rotational invariance at the initial matching stage is. While there are a number of feature descriptors that satisfy these criteria, in this example a scale-invariant feature transform (SIFT) algorithm is used, though any other suitable descriptor could be substituted. Greedy matches are computed between features in pairs of images identified as potentially overlapping using the methods in described for identifying potential overlapping images. These matches are filtered for distinctiveness using Lowe's ratio test with a threshold of 0.6.
This means that the distance in feature space to the first closest match must be no more than 0.6 of the distance to the second closest match. Even with this filter applied, the matches will still contain noise that will disrupt the alignment process. Specifically, they may include matches between features that do not lie on the road plane (such as buildings or signage) or between dynamically moving objects (such as other vehicles). If such matches were retained, they would introduce significant noise into the pose refinement process.
Filtering matches with a restricted motion model Feature matches are now filtered such that they are geometrically consistent with a restricted motion model. Specifically, it is assumed that between a pair of images, the vehicle may translate in the u, v plane and that its bearing may change. In other words, pitch, roll and the height of the camera above the ground plane are assumed to remain fixed. This restricted transformation therefore only has three degrees of freedom. In the projected ground plane images, this means that a 213 proper rigid transformation (i.e., a translation and rotation) is sought The least squares best fitting such transformation can be found in closed form using a restricted version of Procrustes alignment (allowing only translation and rotation). To measure the error between a pair of matched feature locations after computing a best fitting transformation, the Procrustes distance is used and symmetrise by taking an average of the forward/backward transformation.
This restricted motion model is used in conjunction with the random sample consensus (RANSAC) algorithm to compute the set of inlying feature matches. From the greedy feature matches, two matches are randomly selected (the minimum required to fit a 2D proper rigid transformation). The transformation is fitted and then inliers are defined as those with a Procrustes distance of <10 pixels.
An example of the final result is shown in Figure 5. The top row shows a pair of input images. The feature matches after RANSAC filtering between features in the projected ground plane images are shown in the third row. For comparison, the second row shows the result of matching directly in the original images (subject to: first undistort the images, compute greedy feature matches, then fit a general homography to filter the matches). It is clear that the ground plane images give more matches than the naive method and thatthey correspond to the correct transformation. Finally, the bottom of Figure 5 shows the alignment using the fitted 20 proper rigid transformation. Note that this is not the alignment used in the final orthomosaic stitching; it is just a visualisation of the coarse transformation used to filter feature matches.
The final set of inlying matches are stored as A/1.Y where (f,g)e if feature f in the ith image in trace s is matched to feature g in the jth image of trace t. The set of matches between images in the same trace, is written as Pc,. The locations of the features in the ground plane projected images are not used in the following pose optimisation and can be discarded at this point.
Alternatively, given a set of tentative matches between a pair of images, the RANSAC is used to simultaneously fit the constrained homography model to the matches and remove matches that are outliers under the fitted model. Since our constrained homography depends on only three parameters, two matched points are sufficient to fit a homography. The fit is obtained by solving a nonlinear least squares optimisation problem. The RANSAC algorithm proceeds by randomly selecting a pair of matches, fitting the homography to the matches and then testing the number of inners under the fitted homography. An "hiller" is defined as a point whose symmetrised distance under the estimated homography is less than a threshold (for example, 20 pixels in this embodiment). This process is repeated, keeping track of the model estimate that maximised the number of inliers. Once FtANSAC has completed, there exists a set of filtered matches between a pair of images that are consistent with the constrained motion model. Although the model is overly strict, the use of a relaxed threshold means that matches survive even when there is motion due to roll, changes in pitch or changes in the height of the camera.
Pose optimisation Single trace data objective: The system now has initial estimates for the pose of every camera and also pairs of matched features between images that are close together in the sequence. A largescale nonlinear refinement of the estimated pose of every camera can now be performed. Key to this is the definition of an objective function comprised of a number of terms. The first term, Edin, measures how well matched features align in the image plane This is referred to as the data term: s'tz) -data i=I h117' "
_
where x1,-, is the 213 position in the image plane of the Jth feature in image i, obtained by projecting the ground plane feature location back to the image, as described under the heading "Feature Matching". I is the homography from the ground plane to the ith image and is given by: = K [(R, ):,1.2 The function h homogenises a point: hazt,v, w I = )Foscit 'OE -14), -h ml (26) V / (27) (28) Equation 26 shares some similarities with a process known as "bundle adjustment" in the general structure-from-motion problem. However, there are some important differences. First, rather than measuring "reprojection error" in the image plane, error when image features are projected to the ground plane is measured. Second, the objective depends only on the camera poses: it does not need to estimate any 3D world point positions. The first difference is important because it encourages exactly what is wanted: namely, that corresponding image positions should align in the final orthomosaic. The second difference is important because it vastly reduces the complexity of the problem and makes it viable to process very large sets of images.
To solve Equation 26, the system initialises using the process described above and then optimise using nonlinear least squares. Specifically, the Levenberg-Marquardt algorithm is used with an implementation that exploits the sparsity of the jacobian matrix to improve efficiency. Moreover, some additional terms (described in the next section) are included to softly enforce additional prior constraints on the problem.
Priors Since it is to be expected that the orientation of the vehicle with respect to the road surface will remain approximately constant, it is possible to impose priors on two of the angles. First, side-to-side "roll" is expected to be small, only being non-zero when the vehicle is cornering Hence, the first prior simply penalises the variance of the roll angle estimates from zero: X-1 2 (29) £ GR1=1 t. = . 1=1 The second angular prior penalises variance in the angle between the camera optical axis and the road plane, i.e. the pitch angle: ( Spitch (fit' = fi --fl Z-si=1 \ Next, variance in the estimated height of the camera above the road surface is penalised because this is expected to remain relatively constant: \ 2 & neigh! ( (14 t i 1 -ATL [c (31) (30) (31) Fourth, ground control points (GCPs) may be used. These are a set of N, points for which the
GCP GCP GCP GCP
ground plane coordinates 04, , , vAr, ) are known and the correspondence of these points to pixels in some images is known. GCP correspondences are stored in the set G such that (/, j, (x, y)) E G if pixel (x,y) in image i corresponds to the jth GCP. Distance in the ground plane between the actual GCP position is penalised and that predicted by the pose of a camera which has a correspondence to a GCP: c [ GCP H-1 (32) &CCP(1R ? I^Ns1)= j--h urp Finally, the estimated position of the camera in each image is encouraged to remain close to that provided by GPS in two ways. First, deviation from the absolute GPS position is penalised: Alt Etkops = c cunt (33) However, GPS can provide more accurate relative rather than absolute position information. Hence, a second prior can penalise deviation from the relative position change provided by GPS: ERGPS { R }Z)= -c, (hilt mit) 2 C +I A', -1 (34) =1 The hybrid objective that is ultimately optimised is a weighted sum of the data term and all priors.
Between-trace Overlaps An orthomosaic of a complete town, city or road network is likely to comprise many traces (perhaps thousands). Each trace may overlap with many others and to obtain a seamless orthomosaic, all overlapping regions must be optimised together. One solution would be to combine all traces into a single one and run the optimisation previously described. However, even with the simplifying assumptions made, this approach does not scale to millions of images. Instead, an incremental process is used in which a single trace is optimised at a time but ensures that the new trace aligns with the previously optimised traces. This ensures that the size of any one optimisation problem is kept small.
Whenever a new trace is added into the network the fast search technique discussed above is used to find potential overlaps between the new trace and the existing traces and obtain filtered matches between them. Then, for each previously optimised trace s that overlaps with the new trace t, an additional cost is added: Emit' ( (RI. , (35) H-1 -h H-1 Where xt,19 refers to the gth feature in the jth image in trace t and Ht, J is the homography that transforms the ground plane to the j image in trace t, i.e. that is constructed from 11.g and tti.
The overall optimisation for trace t comprises the single trace objective as described under "Single trace pose optimisation" with additional objectives of the form (35), one for each overlapping trace.
Map Tile Index Generation The image of the entire ground plane is presented as a raster map in which any pixel of the image is associated with the coordinates in the ground/reference coordinate system. Depending on the level of detail of the ground plane (which defines the ground resolution indicating how much distance on the ground is represented by a single pixel), the dimension of the map can be extremely large. To enable creating and viewing of the image of the entire ground plane over the end product platform (an example of the implementation of this invention is the web), the image is cut into non-overlapping grids of equal size called tiles. Each tile is identified by a coordinate describing its position on the ground plane.
Given the estimated pose of the camera (as discussed above) for every captured image together with the intrinsic camera parameters obtained during calibration, the captured images are then projected onto the ground plane as illustrated in Figure 4. As the position determined as part of the camera pose is defined with respect to the ground/reference frame, the projected ground-plane image is thus georeferenced. Depending on the camera trajectory, velocity of the camera motion, frame rate, field of view of the camera and other relevant factors, multiple projected images can overlap on the ground plane of the same tile.
Since the orthomosaic for each tile is created by combining into a single image the information from all the captured images that are projected into that tile and because each tile typically contains many overlapping images as presented in Figure 6 the Map Tile Index database needs to be generated and updated whenever new images are acquired. The Map Tile Index database stores for each tile a data file containing image indices that are included in that tile.
The procedure to determine the captured image is visible in which the tile index (or indices) is as follows. Referring to Figure 2, after the camera poses are estimated, image pixel coordinates X=LX,YIT of the four corners of the captured images (for all images of the same size, the coordinates are equal) are projected onto the ground plane by using the equation below, which returns the 2D ground coordinates U. Optionally, provided camera intrinsic parameters and distortion coefficients, the pixel coordinates of the four image corners may be undistorted to obtain distortion-free coordinates prior to projection. u =
where H is the homography defined as Equation (27), and his the function defined in Equation (28).
The projected ground coordinates u are then intersected with the tile boundary to determine in which tile index (or tile indices) it is to be included. The simple method to perform the boundary intersection is to compute the bounding box coordinates from the four projected corners. Then the index of the current captured image is included into the Map Tile Index data file of the intersected tiles. (36)
A Map Tile Index data file for a tile may contain its level of detail, its coordinates with respect to the 2D Cartesian ground coordinate system and all the image indices that are visible in the tile coverage. Since the images that are visible in the tile area may come from difference sources, for example multiple image sequences, the data file may include the index of the image sequences together with the indices of the images in the sequences. An example structure of Map Tile Index data file for a tile index is illustrated as Figure 7. For the data presented in Figure 7, Z denotes the level of detail of the ground plane aka zoom factor, and X and Yindicate the coordinates of the tile on the reference ground plane. S, indicates the index of the image sequence and I, represents the index of the images in the sequence S, . Orthomosaic Tile Creation Since each tile typically contains many overlapping images as illustrated in Figure 6, in order to produce a single image for that tile, the information from all these images must be combined in some way. The naive approach of simply averaging the images leads to very bad results (seams at projected image boundaries, loss of detail through blurring caused by imprecise alignment). To overcome the mentioned problems, this present invention combines the overlapping images by blending in the gradient domain. This is a standard technique in image stitching and seeks to preserve high frequency detail while hiding gradual changes in low frequency variations. However, the present approach contains a number of significant novelties. First, the image-to-ground plane projection procedure is encapsulated into a single function in the reverse order, which offers two advantages including speed and avoiding linear interpolation of a nonlinear mapping. Second, the weighting function used in this invention ensures that, among the overlapping images, the "best" gradient is selected for each pixel in the tile image which encourages the inclusion of maximum detail. Third, the intensity values of all the pixels in the final tile images are computed by solving a sparse system of linear equations which can be solved almost, if not, in real time. Fourth, the present invention ensures that there are no seams at the boundaries of tiles even when each tile is created independently. The detailed description of the implementation procedure, as shown in Figure 3, is explained as follows.
Image to Ground Plane Remapping The process of creating an orthomosaic for a tile starts by projecting all the images that belong to the tile, as stored in the Map Tile Index data file, onto the area on the ground plane that is covered by the tile. When mentioning "projecting an image to the ground plane", it theoretically means projecting every pixel coordinates of the source image to the ground plane defined with respect to the 213 Cartesian ground coordinate system. Instead of deriving the 211:1 ground coordinates from the pixel coordinates, the system of the present embodiment encapsulates the image-to-ground plane projection procedure into a single remapping function in the reverse direction. The process of creating a ground plane image from the captured image in the reverse order is illustrated as Figure 8, and the process flow diagram of the remapping function is shown in Figure 9. The single remapping function described in the present invention offers two advantages including speed and avoiding linear interpolation of a nonlinear mapping The process of projecting a captured image to a tile on ground plane is as follows. Since a tile is a fixed sized plane on the ground and has a fixed coordinate on the georeferenced ground coordinate system, the coordinates of each pixel in the tile image (also referred to as ground-plane image) can be determined. The process of generating a ground-plane image from a captured image is done in the reverse order in which the colour of pixels in the ground-plane image are obtained from their corresponding pixels in the captured image. The relationship between pixels in the ground-plane image and the ith camera captured image is defined by the perspective projection matrix P. P, = K _01,3 (37) To determine the pixel coordinates x = y') of (Li, v), this is done by forward perspective projection using the following equation.
x' = h(P (38) y') is a distortion-free pixel coordinates. Since the captured image is distorted, the obtained coordinates must also be added the distortion factors such that the obtained coordinates are the distorted (x, y) in the same way as the captured image. Knowing where in source image the pixel (u, v) on the ground-plane image is associated with, it is possible to then assign the image intensity / (x, y) to the pixel (u,v) . Finally, a bilinear interpolation is performed to obtain the projected ground-plane image.
Seamless Orthomosaic Tile Stitching Each tile will typically contain many overlapping ground-plane images which must be combined to generate a single tile image from all the projected ground-plane images. The naive approach of simply averaging the images leads to very low quality results (seams at projected image boundaries and loss of detail through blurring caused by imprecise alignment). To overcome this, blending is performed in the gradient domain. This is a standard technique in image stitching and seeks to preserve high frequency detail while hiding gradual changes in low frequency variations. However, some significant modifications to gradient domain stitching are incorporated which make it possible to stitch tiles independently whilst still ensuring that they are seamless when placed side by side.
This is done by constructing a sparse linear equation system of which the unknowns are the image intensity for every pixel in the output orthomosaic tile. The constant terms of the linear system, which are computed as illustrated in Figure 11, are composed of gradients in the horizontal and vertical direction, denoted as and G, , and the guide intensity image, defined as 1g The procedure of computing q, G and ig is described as follows.
For each ground-plane image, weight coefficient, in the range of [0,1], for every pixel is determined as an element of the weight image Wof size i/Xn, which is the same as the tile size. The pixels with high weight means that they are observed at the high resolution while the pixels with low weight are those observed at the low resolution. For example, high weight coefficients are assigned to the pixels with short distances to the camera, in which for this system are the pixels at the bottom part of the captured image, while the pixels with large distances to the camera are assigned with low weight. This arises because the camera is not directed vertically downwards but tilted towards the road surface. By defining the size of the captured image to be r rows and c columns, and the origin of the image (0,0) is at the top-left corner, the weight assigned to the pixels at they row isy/r. This scheme is illustrated as Figure 10.
The procedure of computing W for a ground-plane image is performed in the same way as the image to ground plane projection procedure as illustrated in Figure 8. Note that this step is computed only once as the weight computed from the raw image, W', is same for every image in the system. However, a slight change is made in Step 3 and 4 of the process in Figure 9. In Step 3, the weight coefficient at the pixel (x,y) of the captured image is obtained instead and assign to Wat the pixel (u, v) in Step 4. The final weight image W for the ground plane image is then generated by performing a bilinear interpolation.
We denote the intensity at pixel (x,y) in the desired ground plane image as 1(x, y). The gradient in the x direction can be approximated using forward differences as: 81(x,y)--zI(x+1,y)-I(x,y) (39) Similarly for they direction a,l(x,y),--z 1(x, y)-I(x,y+1) (40) Given all the gradient images for each ground-plane image, the target gradient images for the horizontal and vertical direction Gx and Gy are created by choosing the "best" gradient among all the gradient images. This is done by, for each pixel (x,y) of Gx, finding which ground plane image has the highest value of W(x, y) then assigning the gradient Gx(x,y) of that ground plane image to the best gradient image G. The same procedure is applied to Gy. This scheme encourages the inclusion of maximum detail.
In addition to using weight image as an indication of level of detail for each pixel, the weighting may be combined with a mask to remove undesired regions as presented in Fig 14 (middle row). Pixels in the mask that have their value set to zero (black) indicate the part of the scene to be excluded from the final results. These may include shadows cast by the vehicle and undesired objects in the scene. A mask can be produced for each image or fixed across a whole dataset to remove part of the vehicle such as the bonnet that is visible in the captured image.
This process is illustrated in Figure 14. An integration of weight image and mask can be used to remove undesired part in the scene. In Figure 14, the top row contains images captured with unwanted objects. The middle show masks corresponding to unwanted objects and bottom combine weight image with mask. The left column illustrates masking of a shadow cast by the vehicle. The middle column illustrates masking of undesired objects on the road. The right column illustrates masking of part of the vehicle present in the captured image (the bonnet in this case). Pixels with darker colour have lower weight and black equals zero weight Masks can be provided manually, computed using a separate semantic segmentation process or, in the case of shadows, ray-traced using a 3D model of the van and an estimate of the sun position. A mask comprises a per-pixel weight. This can be used to deal with a variety of phenomena: * Shadows: Shadow boundaries introduce a large gradient that causes artefacts in the blended result Therefore, shadow boundaries, or points close to the boundary, are given a low or zero weight The interior of shadows can still be used for stitching and can be given non-zero weight However, such pixels should not be used as part of the "guide" image as they will cause the blended result to be overly darkened in shadowed regions.
* Visibility of part of the vehicle: If part of the vehicle is visible in the captured images, this can be removed by using a static mask; that is, the same mask for every image.
* Other vehicles, real-world objects not of interest: Semantic segmentation can be used to classify each pixel as belonging to one of a set of possible categories. Any pixel given a non-road label can be assigned a zero weight in the mask.
The selected gradients provide the final orthomosaic that the stitched image must try to match. However, providing gradient constraints alone would leave an unknown constant offset. To remove this indeterminacy and to encourage the overall brightness and colour to match the original captured images, guide intensities are included, which the stitched image must also try to match. The guide intensity image, Icng, is simply the average of all the ground-plane images, as illustrated as Step 4 in Figure 11. In other words, detail is recovered from the gradient in the best image but overall brightness from the average of all images. The overall problem can be written as a sparse linear system of equations. Such systems can be solved efficiently meaning the tile stitching process could potentially be done in real time, on demand as requests for tiles are made from the viewing interface.
However, as the orthomosaic is computed independently for each tile, this causes seams at the boundaries of tiles when they are presented alongside other tiles. To avoid seams at the boundaries of tiles, overlaps are included with previously stitched neighbouring orthomosaic tiles as presented in Figure 12. This is done by extending the size of the average image for e pixels equally at each side to create a small overlap with the neighbour tiles. This enlarged image of size MXM is referred to as the guide intensity image, 'g, presented as Step 5 in Figure 11. To create the guide intensity image at the boundary region, the first step is to check whether any adjacent tiles have already been stitched. If they have, a small overlap (e) is added to the new tile of the neighbour tile into the enlarged tile, as presented again in Figure 12. If not, the overlap region for that neighbour tile is set to an invalid number (NaN) such that they will not be included in the linear equation system.
In the overlap region, there is only the "guide intensity" constraint which is weighted very highly in the objective function. In the implementation, the weights for the pixels in that region are set to 1. This enforces the intensities of the pixels near the boundaries to be blended smoothly with its adjacent tiles. In order to retain the details of the current tile while still transition smoothly from its boundaries, the weights of the interior pixels will constantly increase from Oat the outer pixels of the original tile size for a number of e pixels until it reaches 2, meaning the weight increases every time it moves towards to centre of the tile. Then, the weights remain at this value for every pixel in interior region. The result is that intensities at tile boundaries match exactly with the adjacent stitched tiles and any large intensity changes that would have occurred are made less visible as low frequency transitions. This is a key step that allows us to stitch tiles one at a time but still produce a seamless mosaic when adjacent tiles are viewed next to each other.
At this point, the constant terms of the linear equation system are all defined which include G and I The sparse system of the linear equations can then be constructed as follows. g
If all image intensities of the target orthomosaic are stacked into a long vector 1 E " compute gradients in the vertical and horizontal directions by multiplication with a gradient coefficient matrix J(t E j and J, E j im7 respectively. JG I computes a vector containing the horizontal gradient at each pixel, and JG,I for a vector containing the vertical gradient Both J0, and J0i, are sparse, since only two elements in each row are non-zero Hence, the structure of Ja, looks like: -1 1... 0 For pixels on the bottom or right boundary, it is possible to switch to backward finite differences to compute the gradient The sparse system of the linear equations for our seamless image stitching is constructed as: Jot, 1= (42) 0 gv _ g _ where g>t is a vector containing the horizontal gradients for all pixels for which the gradient is defined. Similarly, gy contains the vertical gradients. Int; is the weight associated with the ith pixel, computed according to the scheme shown in Figure 13. Although the system is sparse and thus quick to solve, the speed can still be improved by reducing the number of equations for guide intensity constraints. Instead of including all the pixels in I, only a small number of guided intensity pixels are included which their weights are equal to A. This is because it is an over-constrained system and the number of equations in the system is sufficient to be solved using only equations with gradient terms and guide intensities atthe boundaries. Not only does (41) this improve speed, the quality of the stitched image is also improved because it avoids the solution being encouraged towards the overly smooth average image.
Orthomosaic Map Generation Since each orthomosaic tile is generated in such a way that it produces no seams at the boundaries between tiles, an orthomosaic for any region can be constructed by simply placing an appropriate subset of the tiles side by side according to its tile positions. In practice, a user interacts with the map by changing the region of interest (e.g. by dragging the map) or the zoom level, which generates requests for a set of tiles. These are either returned directly (if they were precomputed in advance) or stitched on-the-fly in an on-demand implementation.
In principle, the entire orthomosaic for the whole captured region could be created by concatenating all stitched tiles together into one large map.

Claims (34)

  1. Claims 1. A method of generating an orthomosaic image of a geographical survey area comprising: a. recording a set of still images of the survey area using a land-borne camera, in which the camera has an optical axis that is angled downwardly with respect to a surface of the survey area and recording a geographical position corresponding to each image; b. comparing pairs of partially overlapping images to identify common features in the image to derive a model of the motion between the images; c. calculating a pose in respect of each image describing where the camera was in world coordinates and where it was looking when the image was captured; d. projecting each image onto a ground plane to create a set of image tiles; e. generating for each orthomosaic tile to be stitched, an associated the set of images whose projections overlap the tile to be stitched; and f. stitching together a plurality of image tiles to create an orthostatic image corresponding to an area that is present in a plurality of the images recorded in step a.
  2. 2. A method according to claim 1 in which the associated set is derived by computing the projection of the corners of each image and then performing an inside-polygon test for each tile.
  3. 3. A method according to claim 1 or claim 2 in which for each pixel in a tile there is applied a perspective projection operation into each image, giving a non-integer 2D image coordinate.
  4. 4. A method according to claim 3 in which the nonlinear distortion induced by the camera lens is applied to the 2D position to giving the final image coordinate.
  5. 5. A method according to any preceding claim in which linear interpolation is used to compute the colour at each point in an image to be stitched to give a reprojection of the image to the ground plane.
  6. 6. A method according to any preceding claim in which, in step e., a weighting function that chooses the "best" gradient for each tile pixel from each image is applied to create the stitched image.
  7. 7. A method according to claim 6 in which the weighting function is chosen such that the gradient is taken from the image that observed that point at the highest resolution.
  8. S. A method according to any preceding claim in which, a mask is applied to exclude one or more regions of a recorded image.
  9. 9. A method according to claim Sin which a static mask is applied to each recorded image, the static mask being the same for each image to which it is applied.
  10. 10. A method according to claim 8 or claim 9 in which a dynamic mask is applied to at least some recorded images, the dynamic mask varying from one image to another.
  11. 11. A method according to any preceding claim in which the stitched image is computed by solving a sparse system of linear equations.
  12. 12. A method according to any one of claims 11 in which the selected gradients provide targets that the stitched image must try to match and guide intensities that the stitched image must also try to the match during the stitching process.
  13. 13. A method according to claim 12 in which the guide is the average of the projected images.
  14. 14. A method accordingto any preceding claim in which, in step e., in which a check is made as to whether any adjacent tiles have already been stitched, and if they have, a small overlap into the existing tiles is added to the new tile.
  15. 15. A method according to claim 14 wherein in the overlap region, there is only the guide intensity constraint which is weighted very highly in the objective function and there is a smooth transition to a lower weighted guide constraint for the interior of the tile.
  16. 16. A method according to any preceding claim in which, prior to step b., a search is performed to identify a set of pairs of images that are likely to contain overlapping regions for comparison in step b.
  17. 17. A method according to claim 16 in which, in the search includes use of geometric heuristics to identify pairs of images that are likely to contain overlapping regions.
  18. 18. A method according to any preceding claim in which, in step b., motion between pairs of images is mapped using a homography that is a transformation valid when two images view the same planar scene from different positions.
  19. 19. A method according to claim 18 in which step b., includes finding feature matches between pairs of images that are consistent with a homography and computing the homography that is consistent with these matches.
  20. 20. A method according to claim 18 or claim 19 including, in step b., a step of optimising the estimates of camera poses created in step b., by identifying poses that would give homographies between views that are consistent with those estimated from the image pairs in step b.
  21. 21. A method according to claim 18 or claim 19 in which, in step b., images are projected onto a ground plane prior to comparison.
  22. 22. A method according to claim 21 in which in making the comparison in step b., it is assumed that a pair of images differ only by a 2D rotation and/or a translation.
  23. 23. A method according to any one of claims 18 to 22 in which estimated homographies are filtered to discard those likely to be a poor model of the transformation between two views.
  24. 24. A method according to claim 23 in which a homography is deemed to be poor if one or more of the following apply: a. the number of common features identified in step b., is below a threshold; b. when applied to a first of an image pair, the number of common features identified in step b., that it maps erroneously onto a second of an image pair is above a threshold; c. it deviates from an initial estimate derived from a recorded geographical position by more than a threshold; or d. it implies non-smooth motion.
  25. 25. A method according to any preceding claim in which an objective function is applied in step c., ensuring that the estimated poses give homographies that are consistent with those estimated between overlapping images.
  26. 26. A method according to claim 28 in which for each of a pair of images, a homography is computed from the image to the ground plane, and this homography is applied to the 2D positions of matched image features to yield matched features on the ground plane, wherein the distance between pairs of matched features are residuals that are added to the objective function such that the square of these residuals is minimised.
  27. 27. A method according to any preceding claim in which one or more prior assumptions is made about the pose as part of the calculation in step c.
  28. 28. A method according to claim 30 in which the prior assumptions include one or more of a constant camera roll angle, a constant camera height, a constant camera pitch angle and a ground control point of known position.
  29. 29. A method according to claim 30 or claim 31 in which the prior assumptions include a GPS position.
  30. 30. A method according to any preceding claim in which, in step a., an initial estimate of a pose is derived from the recorded geographical position.
  31. 31. A method according to claim 33 in which the initial estimate includes the direction of the optical axis of the camera being determined by the recorded GPS direction of orientation, the viewing angle of the imaged surface is as determined during an initial calibration of the system, and that transverse rotation is zero.
  32. 32. A method according to any preceding claim in which step a., is carried out on a road vehicle.
  33. 33. A method of image processing comprising performing steps b. to e. according to any preceding claim.
  34. 34. Computing apparatus programmed to perform steps b. to f. according to any preceding claim.
GB2011002.9A 2016-12-05 2017-09-25 Method and system for creating images Withdrawn GB2584027A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1620652.6A GB2561329A (en) 2016-12-05 2016-12-05 Method and system for creating images
GB1715491.5A GB2557398B (en) 2016-12-05 2017-09-25 Method and system for creating images

Publications (2)

Publication Number Publication Date
GB202011002D0 GB202011002D0 (en) 2020-09-02
GB2584027A true GB2584027A (en) 2020-11-18

Family

ID=58159665

Family Applications (3)

Application Number Title Priority Date Filing Date
GB1620652.6A Withdrawn GB2561329A (en) 2016-12-05 2016-12-05 Method and system for creating images
GB2011002.9A Withdrawn GB2584027A (en) 2016-12-05 2017-09-25 Method and system for creating images
GB1715491.5A Active GB2557398B (en) 2016-12-05 2017-09-25 Method and system for creating images

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB1620652.6A Withdrawn GB2561329A (en) 2016-12-05 2016-12-05 Method and system for creating images

Family Applications After (1)

Application Number Title Priority Date Filing Date
GB1715491.5A Active GB2557398B (en) 2016-12-05 2017-09-25 Method and system for creating images

Country Status (3)

Country Link
EP (1) EP3549094A1 (en)
GB (3) GB2561329A (en)
WO (1) WO2018104700A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109631919B (en) * 2018-12-28 2022-09-30 芜湖哈特机器人产业技术研究院有限公司 Hybrid navigation map construction method integrating reflector and occupied grid
CN110084743B (en) * 2019-01-25 2023-04-14 电子科技大学 Image splicing and positioning method based on multi-flight-zone initial flight path constraint
CN110097498B (en) * 2019-01-25 2023-03-31 电子科技大学 Multi-flight-zone image splicing and positioning method based on unmanned aerial vehicle flight path constraint
US11972507B2 (en) * 2019-04-03 2024-04-30 Nanjing Polagis Technology Co. Ltd Orthophoto map generation method based on panoramic map
US11922620B2 (en) 2019-09-04 2024-03-05 Shake N Bake Llc UAV surveying system and methods
CN111161143A (en) * 2019-12-16 2020-05-15 首都医科大学 Optical positioning technology-assisted operation visual field panoramic stitching method
FR3110996B1 (en) * 2020-05-26 2022-12-09 Continental Automotive Construction of images seen from above of a section of road
CN111678502B (en) * 2020-06-09 2022-06-14 中国科学院东北地理与农业生态研究所 Method for extracting frozen soil disaster information based on unmanned aerial vehicle aerial survey image
US20220373473A1 (en) * 2020-10-05 2022-11-24 Novi Llc Surface defect monitoring system
DE102020213597A1 (en) * 2020-10-29 2022-05-05 Robert Bosch Gesellschaft mit beschränkter Haftung Masking method, computer program, storage medium and electronic control unit
CN113989450B (en) * 2021-10-27 2023-09-26 北京百度网讯科技有限公司 Image processing method, device, electronic equipment and medium
CN115933652A (en) * 2022-11-29 2023-04-07 北京航天飞行控制中心 Lunar vehicle direct-drive teleoperation driving method based on sequence image splicing and fusion
CN115830246B (en) * 2023-01-09 2023-04-28 中国地质大学(武汉) Spherical panoramic image three-dimensional reconstruction method based on incremental SFM

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001015081A2 (en) * 1999-08-20 2001-03-01 Hebrew University Of Jerusalem System and method for rectified mosaicing of images recorded by a moving camera

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4341656B2 (en) * 2006-09-26 2009-10-07 ソニー株式会社 Content management apparatus, web server, network system, content management method, content information management method, and program
WO2008044911A1 (en) * 2006-10-09 2008-04-17 Tele Atlas B.V. Method and apparatus for generating an orthorectified tile
CN101893443B (en) * 2010-07-08 2012-03-21 上海交通大学 System for manufacturing road digital orthophoto map
US8970665B2 (en) * 2011-05-25 2015-03-03 Microsoft Corporation Orientation-based generation of panoramic fields
US9641736B2 (en) * 2014-06-20 2017-05-02 nearmap australia pty ltd. Wide-area aerial camera systems
CN105592294B (en) * 2014-10-21 2018-10-02 中国石油化工股份有限公司 A kind of monitoring system of VSP excitations big gun group
US9824290B2 (en) * 2015-02-10 2017-11-21 nearmap australia pty ltd. Corridor capture

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001015081A2 (en) * 1999-08-20 2001-03-01 Hebrew University Of Jerusalem System and method for rectified mosaicing of images recorded by a moving camera

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
2015 Joint urban remote sensing event (JURSE), 30/03/2015, Martial Sanfourche et al, "Environment mapping and interpretation by drone", Palaiseau, France. *
2016 IEEE Intelligent Vehicles Symposium (IV), 19-22/06/2016, Gothenburg, Sweden, Bahman Soheilian, "Landmark based localization: LBA refinement using MCMC-optimized projections of RJMCMC-extracted road marks", pp 940-947 *
2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 09/10/2016 Daejeon, Korea, Shuhui Bu et al, "Map2D Fusion: Real-time incremental UAV image mosaicing based on monocular SLAM", pp 4564-4571 *
Dichen Liu, "3D mapping with UAV: procedure analysis and performance evaluation with field tests", 01/11/2013, UNSW Australia BEng Thesis *

Also Published As

Publication number Publication date
EP3549094A1 (en) 2019-10-09
WO2018104700A1 (en) 2018-06-14
GB2561329A (en) 2018-10-17
GB2557398A (en) 2018-06-20
GB201715491D0 (en) 2017-11-08
GB2557398B (en) 2021-08-04
GB202011002D0 (en) 2020-09-02
GB201620652D0 (en) 2017-01-18

Similar Documents

Publication Publication Date Title
GB2584027A (en) Method and system for creating images
Li et al. Remote sensing image mosaicking: Achievements and challenges
CN102612704B (en) Method of providing a descriptor for at least one feature of an image and method of matching features
US8107722B2 (en) System and method for automatic stereo measurement of a point of interest in a scene
Wagner A new approach for geo-monitoring using modern total stations and RGB+ D images
US10366501B2 (en) Method and apparatus for performing background image registration
US20050243323A1 (en) Method and apparatus for automatic registration and visualization of occluded targets using ladar data
US20120027371A1 (en) Video summarization using video frames from different perspectives
Cvišić et al. Recalibrating the KITTI dataset camera setup for improved odometry accuracy
Ribera et al. Estimating phenotypic traits from UAV based RGB imagery
Heather et al. Multimodal image registration with applications to image fusion
Wan et al. Drone image stitching using local mesh-based bundle adjustment and shape-preserving transform
Tang et al. Content-based 3-D mosaics for representing videos of dynamic urban scenes
Müller et al. Robust image registration for fusion
Cao et al. Automatic geo-registration for port surveillance
Feng et al. Registration of multitemporal GF-1 remote sensing images with weighting perspective transformation model
Nicosevici et al. Efficient 3D scene modeling and mosaicing
Gao et al. Real‐time mosaic of multiple fisheye surveillance videos based on geo‐registration and rectification
Tanathong et al. SurfaceView: Seamless and tile-based orthomosaics using millions of street-level images from vehicle-mounted cameras
Xu et al. Online stereovision calibration using on-road markings
Ozcanli et al. Geo-localization using volumetric representations of overhead imagery
Ye Image registration and super-resolution mosaicing
Lach Semi-automated DIRSIG scene modeling from three-dimensional lidar and passive imagery
Creß et al. Targetless extrinsic calibration between event-based and rgb camera for intelligent transportation systems
Produit et al. Pose estimation of landscape images using DEM and orthophotos

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)