EP2335220A2

EP2335220A2 - Method for distributed and minimum-support point matching in two or more images of 3d scene taken with video or stereo camera.

Info

Publication number: EP2335220A2
Application number: EP09786002A
Authority: EP
Inventors: Sergei Startchik
Original assignee: Individual
Current assignee: Individual
Priority date: 2008-07-06
Filing date: 2009-07-06
Publication date: 2011-06-22
Also published as: WO2010004417A3; WO2010004417A2

Abstract

The dense matching of points between a pair of stereo images is required for a number of applications such as augmented reality, security surveillance, navigation, etc. The density of stereo matching depends on the precision of the matching between two images (left and right in case of stereo). This density in turn depends on how precisely individual points are matched with several constraints such as coninuity, epipolar, order, smoothness. This patent discloses a method to automatically compute matchings of point tuples between two images without using previous constraints. The points lie on the specific curve - twisted cubic - that is used to sample two images and establish matchings between those points. Points are grouped by tuples of six or more and described by geometric values invariant to the projection from 3D to the 2D. The chromatic values independent of the illumination complement that description. A combination of algo- rithm steps and data representation allow an optimal and time-efficient implementa¬ tion. The result of such matching can directly be used for dense reconstruction surfaces or augmented reality registration with very high resolution and density.

Description

Method for distributed and minimum-support point matching in two or more images of 3D scene taken with video or stereo camera.

Field of the invention The present invention relates to a method of high-density point registration between images.

Background of the invention

Registration between images taken by a stereo camera or images from a sequence taken by a video camera is a critical and unavoidable step in such applications as 3D scene reconstruction, photogrammetry, scene capture, augmented reality, depth computation, navigation and security surveillance. The registration establishes correspondence between points of two or more images where points correspond to the same object points in the observed 3D scene. Once such registration is in place, it is further used in applications mentioned above to infer information important for the application at hand.

For applications like navigation and video surveillance, the 3D information or depth of the scene represents the main interest. It is further interpreted to compute information such as distance to an obstacle, height of a person, etc. In photogrammetry, the 3D reconstruction with metric information is targeted for measuring precise object size. Higher precision in depth estimation leads to high precision of navigation or object size estimation. The reconstruction can be summarized by the scheme in the Fig. 19. Two images: Left image (1) and Right image (2) taken from different viewpoints are used to reconstruct a 3D surface corresponding to the observed scene. In scene capture for movies, the latter reconstruction is covered with texture taking from image.

For other applications like augmented reality the registration of points in images gives an anchoring information to which virtual objects of augmented reality are aligned. The degree of details and spatial precision of this anchoring information is of paramount value because the human eye is very sensitive to the resulting visual quality of integration of virtual objects into reality.

Summarizing, higher the spatial precision and resolution of registration between images, more advantages all applications can draw from it. This patent focuses on a method for high spatial precision point registration in images containing views of rigid and other objects that can move and occlude each other under changing illumination conditions. It also fits well to implementation on a Graphical Processing Unit or other device or physical material that can perform massively parallel tasks. Registration is a match of spatial locations between two or more images. Registration methods used in application described above differ one from another by type of spatial elements used for matching (points, lines, direct pixels) and constraints that they use to reduce the possible number of matchings between locations.

Two different images taken from two distinct viewpoints are necessary for registra- tion. If stereo is used, the images usually named left image and right image. In 3D reconstruction, several methods are described in [] and their steps are summarized in Fig. 3.

An Initial matching step (80) is used to estimate general scene motion and calibrate the optical system of the camera. It is always applied in the absence of any infor- mation about the scene and thus rely on assumption that majority of the image shows one rigid surface (scene background). The very stable points ( f.ex. corners) are found in both images. The match between groups of such points allow to infer the relative position between left and right images and compute parameters of the optical system (or calibrate it) in step Calibration of camera (81). Once the system is calibrated, in the next step Epipolar geometry (82) a, so called, epipolar geometry between two images Left image (1) and Right image (2) is built as shown in Fig. 19. It allows to match every point in Left image (1) to a line in Right image (2) on which the corresponding point is constrained to stay. To obtain the direct point-to-point match, more constraints are required and this is done in Advanced matching step (84). Generally, the ordering constraint, continuity constraint (or smoothness constraint) which assume that the surfaces are smooth and some order between points is preserved, and chromatic constraint are used. Medium level of occlusion rapidly deteriorates the performance of these constraints and the precision of matching degrades significantly. Previous methods rely on constraints (like f.ex. epipolar) that in turn rely on assumptions about scene (like that no more than 30% of the scene moves), local continuity. Another drawback of these methods is that they work with only one line at a time and thus no information on relative position between lines is taken into account. After the dense matching is established, depending on the application, either 3D Reconstruction (85) or Augmented Reality (86) are done.

In US6859549B1 (and its references) unoccluded initial image of the scene is required and no occlusion is allowed during the process. The type of features used for registration is points, lines and pixels but their tracking is assumed to be done already. The constraints used are : rigidity (the scene is rigid and does not contain articulated objects). Smoothness constraint (scene surface does not contain abrupt changes in depth with respect to camera). Another constraint that can be used is the one of camera motion. In EP1355274A2 reconstruction is done using spatial continuity constraint and error function. Pixels are tracked based on their neighborhood with standard methods and information about depth is propagated around them. Only smooth surfaces can be reconstructed.

In US06061468 two components are used to register (and track) feature points. In the first part image flow is represented by a limited number of spline control nodes that in turn are located as high local texture points. This part contains implicit smoothness constraint and can only reflect global motion. The second part includes local tracking and references methods outside of the patent.

In WO9847097A1 square windows around pixels being matched are used to per- form registration. In US 6,487,304 B1 the three-part cost function (color compatibility, motion/depth compatibility and flow smoothness) is minimized to obtain precise depth and thus rely on motion smoothness. Between some additional assumptions are the uniformity of the flow and majority of rigidity in the scene. Also information about final pixel color in 3D is inferred only from this pixel pairs and not from other parts of the image.

In US 6,348,918 B1 planar surfaces are used to obtain information about depth. In US 6,137,491 B1 the scene is modelled as a collection of layers desribed by 3D plane equations. In US 6,301,446

In real urban, industrial and media environments assumption about scene continu- ity, smoothness and partial flatness are rarely satisfied. To be useful, registration should be able to deal with scene views containing numerous small disjoint details of multiple moving objects, significant abrupt variation of depth, great degree of occlusion, illumination changes, motion blur. The present application replaces steps 80,82,83,84 and does not require constraints. Therefore, a constraint not taking into account local continuity and neighbor points presents various advantages for registration and is proposed in this patent, thus making the method very useful for applications in those fields.

Summary of the invention

The present application describes a method for automatic matching of points tuples in two or more images taken with stereo camera or video camera comprising the steps of sampling two images with curves exhibiting geometric invariant properties and of finding point tuples, along those curves, that exhibit representative geometirc and illumination invariant properties.

In particular, such point tuples preferably correspond to the groups of six or more points linked by 2D projective geometric invariants that have the same value that 3D projective invariants. One of such invariants are preferably cross-ratios of the four- point tuples on the crunodal cubic in 2D image with the value of the mentioned cross- ratio being the same as the cross-ratio of four planes in 3D where the 2D crunodal cubic is the projection of the 3D twisted cubic. Beside geometric invariants, tuples are characterised with chromatic invariants that are independent on illumination changes. Camera is modelled by at least projective transformation. The method comprises the step of analysing each pixel in an image and constructing a polar transformed version of the image followed by sampling of points of the polar transformed version of the image with crunodal cubics. During the sampling, tuples of six or more points are selected and used as locally unique points to represent one image if they are characterised by chromatic invariants and geometric invari- ants that are different from other tuples in their neighbourhood while no surrounding support region of each individual point is used in representation itself.

The point selection process provides that every point of one image is used in several representative tuples that include points from various parts of the image in a distributed manner, that points are selected to densely cover the scene visible in the image and that redundancy is provided by using each point in as many representative tuples as possible.

A method is optimized by preferably transforming image according to sampling curves to optimize computations and where all the coefficients and values are pre- computed before applying the method to a given image size. The method is well designed for being implemented on parallel processor architectures.

Brief description of the drawings The invention will be better understood by reading the description below, illustrated by the figures, where:

Fig. 1 Preservation of cross-ratio on the 3D twisted cubic during projection to 2D crunodal cubic.

Fig. 2Measuring cross-ratio in one image. Fig. 3A classical algorithm for registration that contains common steps followed by either 3D reconstruction or augmented reality.

Fig. 4General algorithm for dense matching of tuple of points.Fig. 1 Preservation of cross-ratio on the 3D twisted cubic during projection to 2D crunodal cubic.

Fig. 5Algorithm for finding representative tuples in the image. Fig. βAlgorithm for inserting tuple into histogram of representative tuples.

Fig. 7General algorithm for dense matching of tuple of points. Fig. δConstraints on space of cubics: 1) Crunodal cubic

Fig. 9Constraints on the cubics. Avoiding epipole.Fig. 1 Preservation of cross-ratio on the 3D twisted cubic during projection to 2D crunodal cubic. Fig. IOConstraints on the cubics: 3) Stability

Fig. 11 Transforming image into polar form Fig. 12Sampling of the transformed image Fig. 13Limits for the radius and angle parameters.

Fig. 14Sampling radial image along one cubicFig. 1 Preservation of cross-ratio on the 3D twisted cubic during projection to 2D crunodal cubic. Fig. 15Spatial coordinates of the point tuples. Fig. 16Computing chromatic descriptions.

Fig. 17Schematic illustration of table (or histogram) for storing representative tuples.Fig. 1 Preservation of cross-ratio on the 3D twisted cubic during projection to 2D crunodal cubic.

Fig. 18Selection of non-uniform areas.

Fig. 19Epipolar geometry illustrated. asdfGeneral scheme of the stereo reconstruction. Detailed description of the invention

The present method for registration of points disclosed in this patent is based on the massive and independent matching of tuples of points. The general algorithm of registration between two images is presented on Fig. 4 and comprises several stages explained in the following sections.

As the computations are intensive in comsumption of processor and memory resources, the preparation step of Allocate memory and intialize structures (60) is important to prepare optimized storage structures to host data. The step Define subset of cubics and precompute parameters (61) allows to precompute numerous parameter values (of cubic curves in particular) that remains the same during computations and thus can save significant time.

The main computations starts with selection of two images in Select first and second image (62). It can be two subsequent or two remote images in the video sequence or two images from a stereo camera. A step of Find representative tuples

(63) is applied to each of the selected images. It corresponds to finding tuples of points that are sufficiently different from their neighbour tuples in terms of chromatic and geometric properties. Local unicity of tuples leads to reliable match between tuples in two images. Once such set of representative tuples is constructed for each of two image being matched, those sets are compared in a step of Matching of tuples

(64) that establishes reliable match between individual tuples.

The first part of the algorithm that is Find representative tuples (63) is composed of several steps shown in Fig. 5 and detailed in section .3. The central concept is to find representative point tuples and characterise them with geometric properties that are reflected by values that are invariant to geometric 3D-to-2D projective transforma- tion produced by a camera. Definition and properties of such values are described in section .1. In practice, instead of finding representative points first and then describing them with invariant values, the algorithm takes specific curves along which invariant values are measured and use them to sample the image and find locally representative points. Only a certain type of curves can be used for sampling. Selection of a set of curves satisfying all the constraints is outlined in section .2.

The second part of the algorithm is the step Matching of tuples (64) which corresponds to matching tuples representing left and right image. This step is detailed in Fig. 7 and described in section .4. After the matching between two images is done, the results can be used for reconstruction, motior^estimation or other application mentioned above.

The new registration algorithm has several advantages. First, low support of each point (limited to one pixel and not its neighbourhood) allows to deal with scenes with high depth discontinuities and presence of, many objects in the field of view occluding each other. No assumptions about scene smoothness or continuity are made.

Second, independence in matching point tuples leads to the possibility to registering multiple objects moving differently without relying on visibility of fixed and rigid background.

Third, massive and redundant matching leads to very high resolution dense registration with subpixel accuracy. Also areas of the image that are not adjacent one to another can be successfully registered. Fourth, modular and independent computations for every geometric element allow easy implementation of the algorithm on DSP₁ Graphical Processing Unit, other parallel architectures or physical object like optics.

1. Geometric invariant property of the twisted cubic as registration condition

An element used in the present invention is a property of a curve called twisted cubic to have an invariant while being projected from 3D to 2D is illustrated in Fig. 1. The Twisted cubic in 3D (21) is defined by equation:

where X₁ are projective coordinates in 3D, θ is the parameter of the curve and A_{4 x 4} is the matrix of parameters a_tJ that define the form of the curve. This curve have 15 parameters (or degrees of freedom) for its definition.

When twisted cubic is projected onto the 2D plane it becomes the Cubic in 2D (32). The equation of the crunodal cubic in general is the following:

If the cubic interesect itself, it is called Crunodal cubic in 2D (22). Let us assume that nodal points exists in 2D images and one takes the Twisted cubic in 3D (21) and selects the four points on it (36,37,38,39). These four points are projected onto the image plane and become four points (16,17,18,19) on the 2D crunodal cubic. A line in 3D that originates from Optical center of the left camera (6) and crosses the 3D cubic in two points is called the Bisecant (15). A point where the Bisecant (15) intersects the image plane corresponds to the point where Crunodal cubic in 2D (22) intersects itself and is called Nodal point (14). Four rays can be constructed from the Nodal point (14) to the four points (16,17,18,19) on the 2D crunodal cubic curve. The similar points can be constructed in the Right image (2).

Contents of left and right images is shown in Fig. 2 for more details. The cross- ratio of the four rays in left and right images can be measured in two ways. First, by measuring distances between points of intersection A, B, C, D of those rays with any line and taking the ratios of those distances as:

Cro_SSRαtιo(A, B, C, D) = J^ ¹^j (3)

\AD\\BC\

Or by measuring angles between rays in terms of angles α_ΛC, α_BD, α_BC, α_AD and taking a ratio betwen sinuses of those values

CrOSsRaWl₁J₂J₃J₄) = . ^AC . ^BD (4) 1^{2 3 4} _smaBcSma_ΛD

Such cross-ratio has the same value for Left image (1) and Right image (2) in general and is independent on the viewpoint from which the curve is viewed (given that nodal point exists in both views). This invariant property will be used to characterise tuples of more than four points with several invariant values (one for each group of four) and match them between two images. For example, if six points laying on a twisted cubic are visible in left and right image, they can be characterised with two cross-ratio values that give same values in left and right image.

2. Subset of sampling cubics

Since only a subset of existing cubics have the property of being nodal, not all tuples of points can be represented with that property. So instead of fitting possible cubics to points, nodal cubics are selected and used to sample the image for representative points laying on them. In addition to being nodal, only cubics that have some specific properties can be used. These properties define a class of cuves that can be used.

First constraint is the existence of nodal point and such cubic (called crunodal cubic) is shown in Fig. 8. There are several ways to define the equation of a crunodal cubic, but only three represent interest for our method: implicit, parametric, with control polygon). All cubic equations will be provided where center of coordinates is placed in the nodal point and thus cubic goes through nodal point x_o,y_o = [o, o] .

An implicit equation of such cubic is given below and defined by four parameters (or degrees of freedom) a, b, c, d: c(ay-bx)²-cdx² = xⁱ (5)

The same cubic in parametric coordinates would be given by the following equation where s is the slope of a line and whose interval is given by s = [o, ∞] x(s) = c(as — b)² + cd

(6) y{s) = s(c(as - b)² + cd)

Another way of defining a cubic is with triangular control polygon defined by four points [χ_o,y_o] , [χ_vyχ] , [J₂^₂] . Ix^y₃] where the first and last point coincide and corre- pond to the nodal point. The equation of a cubic defined with control polygon is given by (in the matrix form):

*ω| = (7)

M where the first matrix is the coordinates of the triangle control polygon and second are the polynomials that describe attraction of the curve point to the vertices of the polygon. Since in the current case the nodal. point correspond to a center of coordinates, the first and last point have zero coordinates thus leading to the form, where / = [o, i] :

μ(/) (Oj The four coordinates of second and third control point [X^y₁] , [χ₂,y₂] fully define the curve shape. A crunodal cubic is thus defined by those four values or degrees of freedom. This form is selected for current algorithm.

Second constraint, nodal point is preferably not close to the epipole. Otherwise, there is an ambiguity in matching between left and right images leading to erroneous information about points motion. Position of the epipole depends on the camera motion that can not be known, so matching cases can not be avoided but their number reduced. Probability of epipole position can be estimated relatively to the possible motions and that position avoided for nodal point.

In Fig. 9 several camera motions are shown with their corresponding epipole positions. The definition of first image as "left" and second images as "right" is used for simplicity only. The most frequent motion is the pan motion shown in Fig. 9.a. The epipole Left epipole (8) is defined as projection of C₁ Optical center of the left camera (6) onto the sensor plane of the right camera Optical center of the right camera (7). When camera rotates, the position of the epipole find itself in the image if displacement is less than sensor size.

When camera performs foward or motion the epipole occurs more often within the image frame if, again, the motion is less than sensor size. Since lateral translational motions are much more frequent than vertical, the epipole can occur within a horizontal corridor on the image. To reduce the risk of , the system will avoid selecting points in that corridor as nodal points.

Summarizing, motions where epipole is in the image occur when motion is below the size of the sensor of the camera and is in the corridor that should be avoided. The third constraint is to provide sufficient stability of the cross ratio. This stability depends, first, on stability of rays orientation and, second, on the, presence of the nodal point in both left and right images. The stability of rays is influenced by minimum distance between two points on the cubic that are defined by rays which is D_mw . Given that at least six points will be taken on the cubic, the elongation of the loop of the crunodal cubic should have minimum cleavage as shown in Fig. 10. This cleavage should be thus minimum AQ_mιn ■

So that matching between tuple visible in the left image and tuple visible in the right image becomes possible. In fact, camera displacement between left and right image can make that twisted cubic in right image does not contain a nodal point. Since camera motion can not be controlled, one can increase the loop of the crunodal cubic used for sampling so that even under significant motion, the chances to preserve nodal point are high. Summarizing, for stability of cross ratio. Summarizing, the set of cubics with nodal point in x_Q,y₀ = [o, o] is defined by control points of the polygon defined above.

3. Sample image with selected cubics to find representative tuples

This section shows how the invariant property described above can be used to characterise the tuples of points. This characterisation allows to select numerous independent point tuples that are representative in the image. Representative tuples are those whose properties (geometric and chromatic invariant values) are sufficiently different from properties of all other point tuples in their neighbourhood. When tuples will be matched between two images this representativeness insures that the chance of mismatches is low. The algorithm for searching of such representative tuples in one image is outlined in Fig. 5.

In the first step Find non-uniform areas (50) the areas composed of pixels not surrounded by neighbours having almost the same color are selected. Geometric stability of one point in uniform area is low and this step is applied to reject this instabiligy. Selection is done by standard filtering detecting low frequency variations as shown in Fig. 18. A note should be taken this step does not perform edge detection. Also this step does not mean that a support regions for pixels (detection of tuples does not rely on surrounding area).

In the next step Select nodal pixel (51), each pixel p_t = [χ,,y,] in nonuniform areas selected previously is used as a nodal point Central point (44) of a set of cubics. In case of small motions comparable with camera sensor size, the horizontal corridor in the central part of the image should be avoided as was shown in Fig. 9.

Instead of performing sampling with cubics in the original image, it is more efficient to do it in a polar image. The algorithm step Make polar transform (52) uses selected pixel P₁ as a center of polar transform and transforms original image into Polar image (45) as depicted in Fig. 11. The [x,y] coordinates are replaced by [θ,r] .

To obtain a form of a cubic in the polar image, the transformation is applied to the Control polygon (48) of the cubic. The four vertices [χ,,y,] are replaced by [θ_;, r,] . Since the first and last point of the control polygon belong to the center in the original image, in the polar image their r, coordinate is zero. Also, since first and second points lay on the ray originating from center in the original image, their first coordinate is the same. This is also true for third and fourth point and the final correspondence between polygon vertices coordinates in original and polar image is the following:

Thus with new matrix of vertice coordinates and the polynomes describing attraction to those vertices remaining the same the polar image, the curve equation will become:

(1 -0³

WO ₌ θi θ₂ 3Kl -O² Q₂(3(l - t)t² + ^) + Q₁(V - ^ ₊ 3 t(l - t)²)

(10)

0 ^rι ^r2 0 3/²O -O 3Ir₁(I - t)² + 3r₂(-t+ l)t²

The curve is defined (as before) with four degrees of freedom Q₁, Q₂, r_x, r₂. AA point on the cubic (13) is defined by four coordinates of polygon and pre-defined polynomials. To sample polar image, one has to define cubics that are acceptable for sampling and select corresponding part of the four parameters space. This part of the space will be used to generate cubics for sampling and will correspond to the step Define subset of cubics and precompute parameters (61) of the algorithm described in Fig. 4.

For stability reasons points in the image are not sampled closer than R_mw pixels to the nodal points which corresponds to the Minimum diameter (42). This value is generally 3 pixels. There is also a Maximum diameter (43) R_mαx which can go beyond the limit of the image (indeed even if one part of the cubic is outside of the image, the remaining part can be used for sampling). This value is fixed to the double of the image size since beyond that limit, the cubic part within the image becomes very linear and not appropriate for sampling. Therefore, the range for radius is r = [R_mιn,R_mαx\ as shown in Fig. 13.a.

The angle parameter θ can not be set simply with an interval. First, the difference between two values B₁ and θ₂ can not be lower than Q_min or more than Q_mαx as shown in Fig. 13.b. Second, a cubic is allowed to have one of its parts outside of the image. For simplicity we consider that only a half of each cubic can be outside of the image, but this part can be different in general. The values that parameters θ, and θ₂ can take and their dependence are given in Fig. 13.c as a hashed area. In addition to Q_mιn and θ_max explained above, the graph contains parameter θ_taage corresponds to the maximum value of angle for radial image.

The whole class of nodal cubics that is thus needed to be used for sampling one image is given by range r = [R_mw,R_max] and range for G₁ and G₂ are given in Fig. 13.c. This four-parameter space of cubic parameters is sampled with a fixed step between values of parameters. Since these values depend only on the size of the image, they are stored to avoid recomputing for each video frame. This is a part of the algorithm step Define subset of cubics and precompute parameters (61) . Steps in cubic parameters are defined so that distances between cubics generated by those steps is less than a pixel.

Once the set of possible cubics was defined as described above, the image can be sampled with those curves. Taking one cubic with parameters G₁, Q₂, r_v r₂ from a set defined previously, one need to define the way image is sampled with that cubic.

First the equation of such cubic can be simplified as follows where coefficients are separated from variables.

Θ₂(3(l - i)t² + f³) + O₁C(I - O³ + 3r(l - O²) 2(θ, - θ₂)f³ + 3(θ₂ - θ,)/² + θ₁ 3Ir₁(I - t)² + 3r₂(- t + l)t² 3(r₁ - r₂)r³ + 3(r₂ -2r,)/² + 3r₁f

In this situation, the coefficients in the first matrix are stored as part of the step Define subset of cubics and precompute parameters (61) since their values will be used numerous number of times for each nodal point in the image.

In the algorithm finding representative tuples (described in Fig. 5), once one cubic was selected from the set defined above in Use one cubic (53) step, the radial image will be sampled along that cubic in a step Sample points on the cubic (54). Sampling radial image with individual cubic is shown in Fig. 14.

The parameter that defines the sampling point position on the cubic is * = [o, i] . Selecting a range of values t_t = {o + i/N, t = \...N} of this parameter gives the sampling step on the curve. Generally, defining a 1 -pixel step between points on the curve as shown in Fig. 14 is a good compromise. Again pre-defined values of that parameter allow to precompute coordinates of points for all sampling cubics and store them dur- ing Define subset of cubics and precompute parameters (61) step. Doing so, for each nodal point and for each sampling cubic, the coordinates are fetched from memory and used to sample N distinctive points that are located on the curve and belong to non-uniform areas:

Once the set of all points on the cubic are sampled, their combinations can be analysed. Combinations of points are selected according to two criterii. First, groups of six points are preferably used since they are sufficient to define a transformation. Second, since occlusion by objects can hide some of the points, all combinations should be taken into account. This is achieved by having six nested loops in the algo- v rithm in Fig. 5 and select all possible combinations of P₁ ,p₂ ,p₃ ,p₄ ,p₅ ,p₆.

Once the six points were sampled in the previous step, the properties of the tuple need to be computed. Computing geometric properties (cross-ratios) is described in Fig. 14 and chromatic properties in Fig. 16.

The first step Compute cross ratios (55) evaluates three geometric invariants

Z₁ ,1₂ ,i₃ that characterize a tuple of six points. The rays Bisecant (15) originating from nodal point Nodal point (14) in the original image become vertical parallel lines Vertical ray (46) in the radial image. The cross-ratio between rays Bisecant (15) in original image is equal to cross ratio between parallel lines in radial image. Measurement of that cross-ratio is, however, much simpler computationally in the radial image since it is just the cross ratio between horizontal coordinates Vertical ray coordinate (47) of those rays. For three sets of four rays the cross-ratios between them is computed. These three combinations are shown in the lower part of the Fig. 14.

Once six points are selected, a step of Compute chromatic properties (56) is applied. A tuple of six points is characterized by six chromatic values:

c. ⁼ {[*,*, ^v]'^{/= 1}^-⁶I (13) Illumination of the observed scene vary depending on viewpoint and on behaviour of light sources. Under those illumination changes, the chromatic values in the observed image are transformed. This transformation can be modelled with several approximating transformations like for example scaling of each of rgb channels, linear transformation in rgb space, etc.

The six observed points c_v c₂, c₃, c₄, c₅, c₆ are considered to be illuminated by the same or similar source. Under such change, the points in the chromatic space HSV will move along lines Chromatic trajectories (34) originating from the upper vertex of the HSV space as shown in Fig. 16. Therefore, the transformation from one view of the scene (and six points) to another corresponds to the linear shift in one direction of all points.

Therefore, absolute chromatic values in the analysed image cannot be used directly. The only reliable information that remains is the chromatic invariants that are computed from the color values of the points in the tuple and independent of such transformation. Chromatic invariants for the considered transform require several points for computation and are done as follows. Surfaces of two triangles defined by the following chromatic points are computed:

^135 C₁ C₃ C₅ ^246 C₂ C₄ C₆ (14) Under the described transformation, their ratio will be independent:

*c = C₁₃₅ ZC₂₄₆ (15)

Therefore, this invariant value will be computed for point tuple as shown in Fig. 16 corresponding to the step Compute chromatic properties (56) in the algorithm. It should be noted that additional values can be computes as chromatic invariants. For the current setup, three geometric invariant values and one chromatic invarian value provide invariant description of the six point tuple.

In the next step Insert tuple in the histogram (57) the obtained tuples are compared to already stored to define their representativeness. Only stable and representative point tuples, distinct from their neighbouring points are interesting for matching (thus reducing the risk of mismatches). To filter tuples that are stable and store them for matching, the step is used.

The principle is to reduce at maximum the number of tuples to be compared. First, tuples having same geometric and chromatic invariants are potentially similar. To make this comparison, tuples are stored within a four dimensional look-up table with I₁ ,1₂ ,1₃ and /_c as indices as schematically shown in Fig. 17. One cell of such table will contain tuples that have same (similar up to predefined delta) values of those four invariants.

¹ Insertion of the current tuple into this table occurs according to the algorithm described in Fig. 6. First, using invariant values of the current tuple Z₁ ,I₂ ,/₃ and i_c , the look-up table is accessed and all elements having similar values are retrieved as a set H₁. This set thus contains all tuples having invariant values similar up to certain threshold A₁.

The comparison set being reduced a more refined comparison can be applied. One-by-one stored tuples are compared with current tuple for to having close resemblance in invariant values.

If this comparison is positive, two potentially similar tuples are compared with respect to their absolute chromatic values. Chromatic values p_x ,p₂ , P₃ , P₄ , P₅ , P₆ are fetched from tuples stoage. The difference between two tuples is estimated with respect to illumination transformation:

This difference is computed vector-wise and is based on the metrics that take into account the chromatic transformation. If the difference is lower than threshold, the tuples are considered chromatically similar and processing continues with geometric positions of individual points.

The geometric positions of tuples are compared. Spatial coordinates are retrieved from storage in order to be compared with current tuple. Geometric coordinates of the tuples in the original image are computed from coordinates in the radial image by applying original image coordinates of the control polygon vertices.

Then, obtained coordinates are compared geometrically. In fact, the same invariant values does not mean that there is a transformation that maps points in the tuple at hand to the tuple from histogram cell. A projective transformation is computed that maps at best points in one tuple to the points in the other. The difference then is computed as sum of distances between mapped points. If the difference is big and thus these two tuples of points can not correspond to the same structure in 3D, the comparison continues with next element in the cell. If, however, the geometrical difference is small and these two tuples can be mismatched, the last comparisons attempts to determine if these two tuples occur in parts of the image remote one from another.

If the current tuple is geometrically close to the tuple at hand, the tuple that was found similar in the list is marked as "duplicate". Also, a difference between the tuple in the histogram and tuple is stored. This is done to define the neighborhood of the tuple within which other tuples should be considered as similar to this one.

If no tuples in the current cell of the histogram were found to fall sufficiently close chromatically and geometrically to the tuple at hand, this tuple is considered as representative (until another tuple is found close to it). It will, thus, be added to the histo- gram cell at the end of the list of tuples. Spatial coordinates and chromatic values of the tuple are stored as part of the tuple description. Computation of spatial coordinates from radial representation is shown in Fig. 15. The representative tuples are described with invariant values and stored as characteristics for current view.

During the construction of the histogram, some additional constraints are used to have a more equal distribution of representative tuples in the scene. A density histogram is constructed for the left image showing in how many tuples the current pixel was used. The use of those pixels in other tuples is reduced (they, however, will be used in neighbor tuples for comparison). The redundancy of representation is important, the participation of each pixel in a fixed number of tuples (for example ten). One avoids the use of one pixel in too many tuples to avoid extreme dependence on this point.

Once all possible cubics for current pixel were processed, the next pixel is selected in the step Select nodal pixel (51) and the processing continues. Once all pixels of the image were processed, the main of the processing ends and some second- ary steps take place. In the step Clean histogram (58) all cells of the histogram are scanned. For each cell, the list of tuples is fetched and scanned. All tuples that are marked as "duplicate" are removed from the list. A cleaned list is stored back in the histogram. A more compact histogram is kept as representation of the scene with representative tuples and is denoted as representation table RTT₁ .

4. Matching representations of two images

Once all representatives tuples in two images that belong to the same scene were identified and two representation built for left (RTT₁ ) and right (RTT₂ ) image, one can proceed with their matching with another image. The algorithm is presented in Fig. 7.

Every cell of the table storing representative tuples is analysed individually. All elements of two corresponding cells from RTT_X and RTT₁ are fetched and compared one- by-one. When two tuples are compared, invariant values are compared together with - chromatic values. As in the insertion step, a transformation is computed to see how close two tuples can be matched one to another by projective transformation. All possible matches are stored as pairs of tuples in the table RTH₁₂ that reflects matching between two images.

After the matching step has produced a structure reflecting the match between two images and consolidated in RTH₁₂ one can use that information for several tasks or application. Those application does not make part of the invention, but benefit they draw from the results of invented algorithm.

Since for recovering a camera motion, six points are necessary, the representation RTH_n already gives pairs of such six-points tuples. For each pair, a transformation is estimated and parameters of this estimated transformation are stored in a histogram

TT LR -

The density of representation with tuples and the absence of the support regions for points allows to deal with almost random occlusion since every point is virtually related to (at least) six other points at various parts of the image.

Each point can thus contribute multiple times to a certain transformation of the _t scene and thus to one earner motion. If one transformation clearly stands out from others, this would mean two things. First, the one scene motion is there and not many objects are occluding the scene. Second, the scene is rigid.

This independence provides a spatial redundancy of matching and thus provides robustness. Also, regularly textured areas are well suited for the algorithm contrary to other methods. It is understood that embodiment may be realised in different Torms that are fully compliant with current invention.

The algorithms of learning tuples and searching them can be implemented as an image processing software modules. Such software can be run on a computer with CPU, memory and storage, on DSP within an embedded system or on a Graphical Processing Unite with parallel architecture.

Cameras that can be used for current invention realization should be able to take pictures from different viewpoints of the scene, video camera that takes a video sequence while undergoing a motion or stationary or, finally a multicamera system that is synchronised.

Claims

CLAIMS:

1. A method for the registration of two or more images (taken with stereo camera or video camera) by automatically matching points tuples from said image, said method comprising the steps of: - sampling two images with curves exhibiting geometric invariant properties, and

- finding point tuples, along said curves, that exhibit representative geometirc and illumination invariant properties.

2. The method of claim 1 where the point tuples correspond to the groups of six or more points linked by 2D projective geometric invariants that have the same value that 3D projective invariants.

3. The method of claim 2 where the geometric invariants are the cross-ratios of the four-point tuples on the crunodal cubic in 2D image with the value of the mentioned cross-ratio being the same as the cross-ratio of four planes in 3D where the 2D crunodal cubic is the projection of the 3D twisted cubic.

4. The method of claim 2where the tuples of six or more points are characterised with chromatic invariants that are independent on illumination changes approximated by linear transformation in HSV space.

5. The method of claim 1 comprising the following steps:

- analysing each pixel in an image and construct a polar transformed version of the image - sampling points of the polar transformed version of the image with crunodal cubics

- finding tuples of six points characterised by chromatic invariants and geometric invariants of cross-ratios that are different from other tuples in their neighbourhood

- using the set of such locally unique points to represent the one image

6. The method of claim 1 where the matching of point tuples is done in an independent manner and without using any type of surrounding support region of each individual point.

7. The method of. claim 1 where every point of one image is used in several representative tuples that include points from various parts of the image.

8. The method of claim 7 wherein points are selected to densely cover the scene visible in the image, to construct tuples establishing relations between different parts of the image in a distributed manner and to provide redundancy by using each point in as many representative tuples as possible.

9. A method of claim 1 where the projective transformation is modelling the camera

10.A method of claim 1 where said two or more images are transformed according to sampling curves to optimize computations and where all the coefficients and values are precomputed before execution of the algorithm for given image size.

11. The method of claim 1, optimized for implementation on parallel processor architectures.