US9449431B2 - Systems and methods for matching scenes using mutual relations between features - Google Patents

Systems and methods for matching scenes using mutual relations between features Download PDF

Info

Publication number
US9449431B2
US9449431B2 US14/736,738 US201514736738A US9449431B2 US 9449431 B2 US9449431 B2 US 9449431B2 US 201514736738 A US201514736738 A US 201514736738A US 9449431 B2 US9449431 B2 US 9449431B2
Authority
US
United States
Prior art keywords
features
descriptive
subsets
feature set
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US14/736,738
Other versions
US20150279099A1 (en
Inventor
Ondrej Kotaba
Jan Lukas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honeywell International Inc
Original Assignee
Honeywell International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honeywell International Inc filed Critical Honeywell International Inc
Priority to US14/736,738 priority Critical patent/US9449431B2/en
Assigned to HONEYWELL INTERNATIONAL INC. reassignment HONEYWELL INTERNATIONAL INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Kotaba, Ondrej, LUKAS, JAN
Publication of US20150279099A1 publication Critical patent/US20150279099A1/en
Application granted granted Critical
Publication of US9449431B2 publication Critical patent/US9449431B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • G01C21/32Structuring or formatting of map data
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G06K9/00805
    • G06K9/46
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • G06T7/0028
    • G06T7/0034
    • G06T7/0042
    • G06T7/0044
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/35Determination of transform parameters for the alignment of images, i.e. image registration using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • Typical approaches for matching objects from different images are typically descriptor based. That is, they are designed to detect things containing sufficient information to be considered interesting and unique within the scene. From any such object, features are extracted which allow calculation of a description vector that contains sufficient information to identify the object within the scene.
  • the particular description vectors are selected to be invariant to expected transformations of the scene. That is, for an expected transformation, the derived description vector should remain a valid representation for describing the object, despite the transformation.
  • One fundamental problem with the description vector approach is that to design a description vector that fulfills some desired properties, other must be given up. For example, it is not possible to produce a description vector that is generically invariant to all transformations.
  • Embodiments of the present invention provide methods and systems for matching scenes using mutual relations between features and will be understood by reading and studying the following specification.
  • a processor for implementing robust feature matching between images comprises: a first process for extracting a first feature set from a first image projection and extracting a second feature set from a second image projection; a memory for storing the first feature set and the second feature set; and a second process for feature matching using invariant mutual relations between features of the first feature set and the second feature set; wherein the second feature set is selected from the second image projection based on the identification of similar descriptive subsets between the second image projection and the first image projection.
  • FIG. 1 is a block diagram illustrating projection matching using mutual relations of one embodiment of the present invention
  • FIG. 2 is a flow chart illustrating a process for robust feature matching using mutual relations between features of one embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating one embodiment of the present invention.
  • Embodiments of the present invention provide systems and methods for matching different projections of a real world scene using a concept referred to herein as the matching of mutual relations between features rather than description vectors that describe the features.
  • Mutual relations suitable for matching may be defined between features that are mutually static over the time in chosen reference frame. For example, when a sensor captures a scene and produces a 3-dimentional projection of the scene (such as a point cloud captured by Light Detection and Ranging (LIDAR) technology device) one mutual relation would be a distance between any two mutually stable points in the projection. That is, a mutual relation will represent a difference in a physical or tangible property with respect to objects in the scene.
  • LIDAR Light Detection and Ranging
  • Sets of mutual relations are then used to determine how objects (or features of objects) are positioned with respect to each other. This is in contrast to the description vector, which only describes how the objects look. For example, a distance between a corner of one object and a corner of another object would represent one mutual relation with respect to those two objects. Differences in temperature indicated by a thermal image of a scene would be a basis for defining a mutual relation between features in another embodiment as could be features such as reflectance and color in still other embodiments. Further, sets of mutual relations may be based on features such as, but not limited to, distances, areas, angles, solid angles, volumes, generalized areas or volumes to higher dimensions, central moments or a histogram. For example, in one embodiment a set of features includes planar features with the angles between normal vectors of the planar features as invariant relations.
  • the projection of the same scene captured twice from different places using a camera will differ—the 3D to 2D projection has infinite number of possibilities and some or all of the features extracted from such projection will differ in shape, size and need not be present in some of the projections at all due to occlusions.
  • any property chosen for matching mutual relations of features must consider the transformation applied by the particular sensor used.
  • the characteristics of the particular situation in which images are being captured must be considered. For example, if you know that your LIDAR sensor is not rotating and is always facing the same direction, you may anticipate that a property such as a difference in object altitude from the ground may form the basis for a mutual relation between objects, as well as a distance between the objects.
  • a property such as a difference in object altitude from the ground
  • a distance between the objects As long as the property is invariant (i.e., agnostic) to the transformation applied, then that property may be used for matching mutual relations using embodiments of the present invention.
  • embodiments of the present invention are not limited to matching 2dimensional or 3-dimensional projections using cameras and LIDAR sensors. In fact, sensors providing transformations into 1 dimension, 4 dimensions, 5 dimensions, or greater are anticipated as within the scope of embodiments of the present invention.
  • FIG. 1 is a block diagram illustrating projection matching using mutual relations of one embodiment of the present invention shown generally at 100 .
  • An image capturing device 110 captures a first projection (P 1 ) of a real world scene 105 . Either at a subsequent point in time or from a different location, image capturing device 110 captures a second projection (P 2 ) the real world scene at 102 . Because image capturing device 110 might have moved from the between capturing P 1 and P 2 , these respective projections are assumed to represent different views of real world scene 105 .
  • a feature extraction at 120 extracts features of interest that appear in the projection P 1 to form feature set A.
  • feature extraction at 122 extracts features of interest that appear in the projection P 2 to form feature set B.
  • features of interest may include objects and surfaces such as floors, walls, corners, curved or linear edges of objects, and the like. Defining characteristics of any physical object may be extracted as a feature from a projection.
  • points of a point cloud projection can be defined as features.
  • lines, curves or planes defined by points within the point cloud can also be defined as extractable features for form set A and set B.
  • Robust feature matching using mutual relations is applied to the features of set A and the features of set B at 130 .
  • robust feature matching at 130 identifies corresponding features from set A that appear in set B using mutual relations between features contained in feature sets A and B. From the features of set A and the features of set B, it is possible to determine how the image capturing device 110 moved from the time it captured P 1 to the time it captured P 2 .
  • the minimum number of features sufficient from each projection P 1 and P 2 to define the respective transformation associated with the movement of image capturing device 110 is referred to herein as the “minimal subset”.
  • each minimal subset defined for set A includes at least a number n of features from set A where n is based on the number of mutual relations that are mathematically necessary to calculate a transformation between the two projections P 1 and P 2 .
  • This number n is a function of both the particular type of projection utilized by the image capturing device, and the known physical dynamics affecting the image capturing device.
  • the number n for forming a minimal subset may depend on the type of mutual property that will be used for matching. For example, in the one embodiment where the set of features includes planar features with the angles between normal vectors of the planar features as invariant relations, the number n for forming a minimal subset would be three.
  • image sensors technologies will apply different projection transformations to a real world scene.
  • a camera typically projects a real world scene onto a 2-dimentional planar surface.
  • a LIDAR scan projects the same real world scene as a point-cloud into a 3-dimentional space.
  • the particular image sensors technology used by image capturing device 110 at least partially determines how many features are sufficient to define a minimal subset.
  • the 3-dimentional projection produced by a LIDAR when 3 features are identified in a projection P 1 , and the same 3 features can be identified in projection P 1 , it is possible to use knowledge of the locations of the features(in two distinct camera reference frames) to determine how the LIDAR moved. That is, a pair of 3-point subsets from set A and set B (each point being a feature, for example) is sufficient to estimate a transformation that explains the difference between P 1 and P 2 . In contrast, for an image capturing device that generates 2-dimensional projections onto a plane, a subset of at least 5 features is sufficient. That is, 5 features in the projection P 1 that corresponds to 5 features in projection P 1 are sufficient to estimate the transformation undergone, or a finite number of its hypotheses.
  • Physical dynamics affecting image capturing device 110 also play a factor in determining n.
  • the number of extracted features sufficient to form a minimal subset may be reduced if it is known that rotation of the image capture device 110 is not possible.
  • the number of extracted features sufficient to form a minimal subset may be reduced if it is known that the image capture device 110 produces projections that are invariant to scaling or some other effect.
  • the number n of features to include in a minimal subset for a particular application are readily determined by one of ordinary skill in the art upon reading this specification using their knowledge of the particular transformation utilized by the sensor they plan to use and the anticipated application for which the sensor will be used. Utilization of more than n features to form subsets is also contemplated as within embodiments of the present invention. In that case, the term “descriptive subset” is used herein and defined as a subset that contains at least a minimal subset of n features.
  • [c ⁇ n ⁇ ,d ⁇ n ⁇ ] [c ⁇ 1 ⁇ , d ⁇ 1 ⁇ ] . . . up to [c ⁇ n ⁇ , d ⁇ n ⁇ ] include all minimal subsets of A and B respectively, belonging to the projections P 1 and P 2 that define a transformation T between P 1 and P 2 .
  • 50 to 60 minimal subsets are obtained coming from set A and set B, that belong to projection P 1 and projection P 2 . Together they are used to define the transformation T.
  • Mutual relations between features is the technology described herein that determines which feature d ⁇ q ⁇ from d most likely corresponds with a particular feature c ⁇ p ⁇ from c.
  • a process of one embodiment of the present invention is described in FIG. 2 .
  • FIG. 2 illustrates a process for robust feature matching using mutual relations between features of one embodiment of the present invention.
  • the process begins at 210 with defining a first descriptive subset (c s1 ) from a feature set A extracted from a first image projection.
  • the descriptive subset c s1 is chosen randomly from all the extracted features available for the first image projection in feature set A.
  • descriptive subset c si is chosen based on a predefined pattern or a non-uniform random distribution that favors desired properties. For example in one embodiment, a descriptive subset c si is selected to include three feature points that are not co-linear. This will avoid defining descriptive subsets that provide singular solutions.
  • descriptive subset c si is selected to avoid defining a subset having features that are close together relative to other available features in the projection to avoid solutions sensitive to noise.
  • selection of a descriptive feature subset is achieve by a random, pseudorandom, quasi-random or any deterministic process, including a setup where a check for the set suitability is included.
  • mutual relations between features are optionally pre-processed, pre-sorted, or indexed. For example, referring back to FIG. 1 , in one embodiment, after feature sets A and B are extracted from projections P 1 and P 2 , one or both are saved as a feature table 145 in a memory 210
  • the feature table 145 is optionally sorted so that the mutual relations representing the greatest differential for a selected characteristic appear first in the table. In another embodiment, the feature table 145 is sorted so that sequential sets of features for set A and/or set B are not co-linear. Thus, in one embodiment, selecting the first minimal subset c s1 of features from the first image projection simply comprises sequentially selecting features starting from the top of feature table 145 . Using this approach, those minimal subsets most likely to produce accurate solutions with the least noise will be used first. In certain applications, such pre-processing of the mutual relations allows the system to converge at a quality solution more quickly than one based on a pure random selection of features based on a uniform-distribution. In some embodiments, the table of mutual relations is indexed to allow a quick lookup of similar values to speed up the search for the matching minimal subsets. In general, any indexing method used in the database environment may be used.
  • the process proceeds to 220 with selecting a second descriptive subset (d s1 ) from a feature set B extracted from a second image projection.
  • the features for second descriptive subset d s1 are selected so that mutual relations between its features are most similar to the mutual relations between the features of descriptive subset c s1 .
  • the first and second descriptive subsets are then said to define a pair of descriptive subsets.
  • similarity is a measurement of the relation between the features selected above that define the mutual relations between features (for example, distance between features, angles between features, or areas or volumes defined by features).
  • being similar in the sense used herein means that the pair of descriptive subsets c s1 and d s1 share approximately the same value for the mutual relation properties that are invariant with respect to the transformation T. More than one relation type may be used.
  • one similarity in the sense use herein is defined by traditional geometric triangle congruency. That is, three points in c s1 would each have a mutual relation between them (their distance from each other) that can be used to define three sides of a triangle.
  • similarity is determined by estimating the variances and/or co-variance of the used invariant properties and using such information to determine the feature correspondences using a statistical test.
  • selecting a corresponding minimal subset from feature set B would comprise finding a descriptive subset d s1 that defines a triangle having dimensions approximately the same as the triangle defined by c s1 .
  • the area of the triangle defined by the minimal subset could be used as a single scalar similarity parameter. That is, an alternative similarity measure for determining a descriptive subset d s1 is be based on finding a triangle from the features of set B having an area similar to the area of the triangle defined by descriptive subset c s1 . Congruency based on other shapes, areas and/or volumes definable from the values held in a descriptive subset can form the basis for determining a similarity of mutual relations.
  • more than one minimal subset from set B may be found that is congruent to descriptive subset c s1 .
  • the descriptive subset that is the most congruent with descriptive subset c s1 is selected as descriptive subset d s1 .
  • pre-filtering all the possible triangles in set B by their area into a lookup table 145 decreases the number of combinations that have to be considered to find a match for descriptive subset c s1 .
  • Matching based on a mutual relation such as point distances involves more calculation and in some embodiments, it is therefore performed on the pre-filtered data set known to already fulfill the simpler—scalar—relation similarity.
  • the method proceeds to 230 with calculating a transformation T from the pair of similar descriptive subsets chosen above to form a hypothesis H.
  • the transformation T forms the basis for a hypothesis H ⁇ c s1 , d s1 ⁇ .
  • transformation T is calculated using a method derived from the specific situation.
  • the transformation derived is not necessarily a linear operation, but may also be a combination of several linear operations (rotation +translation), for example. In some cases, it may be linear but in the general case a linear operation is not necessarily expected.
  • H describes a hypothesis that describes movement of the image capturing sensor 110 between the capture of projections P 1 and P 2 , along with the two sets of data that support this transformation hypothesis, as described below.
  • c sp ′ T(c sp )
  • the method proceeds to 240 with evaluating the quality of hypothesis H.
  • Various means are known to those of ordinary skill in the art for determining the reprojection hypothesis quality.
  • a mean (squared) reprojection error of the features in the scene is determined for different hypotheses H and compared.
  • each hypothesis His stored into memory.
  • the method then proceeds to 250 with storing a final hypothesis transformation H B into a memory device.
  • the final transformation H B represents the best estimated hypothesis H based on the quality metric.
  • blocks 210 to 240 are iteratively repeated each time finding at least one similar descriptive subset d sq in set B for every selected descriptive subset c sp from set A, and deriving from each resulting descriptive subset pair [c sp , d sq ] a hypothetic transformation T ⁇ c sp , d sq ⁇ , further forming the hypothesis H ⁇ c sp , d sq ⁇ , together with those features from sets C and D, that support this hypothetic transformation.
  • a process for repeatedly generating and testing hypotheses using evaluation by the other set members, as described in blocks 210 - 240 is sometimes referred to as a RanSaC estimation method.
  • the H ⁇ c sp , d sq ⁇ found from the iterations having the best quality as determined block 240 is stored into the memory device as H B .
  • the method above produces two very useful products.
  • the first product is an output in the form of the best hypothesis H B stored into the memory device.
  • the final hypothesis H B defines a transformation T B that represent a best estimation of how an image sensor moved.
  • T B may be used to transform the features appearing in P 1 into P 2 , or match features appearing in P 1 with corresponding features in P 2 .
  • an object identified in the first image can be identified and located in the second image using H B .
  • any feature of set A will produce a projection A′ and the number of those projections A′ that line up with a epipolar lines belonging to corresponding features in set B will be maximized.
  • FIG. 3 is a block diagram illustrating one embodiment of the present invention.
  • a self navigating vehicle 300 includes an image capturing device 310 and a processor 320 implementing a robust feature matching using mutual relations process 325 , such as described above with respect to FIG. 2 .
  • image capturing device 310 captures a first image of a real-world scene 305 .
  • vehicle 300 has at least approximate knowledge of its own coordinates with respect to the reference frame of the real-world scene 304 as it captures the first image.
  • Vehicle 300 subsequently travels to a second location and captures a second image of the real-world scene 305 .
  • processor 320 calculates and stores into memory 315 a final hypotheses H B
  • vehicle 300 determines coordinates for its new position by applying the transformation T B belonging to the best hypothesis H B to its coordinates in the navigation frame.
  • the difference in coordinates can also be used to determine vehicle parameters such as, but not limited to, vehicle 300 's velocity (when time between image captures is known), heading, and orientation (i.e., yaw, pitch and roll).
  • vehicle 300 applies the transformation T B belonging to final hypotheses H B to known obstacles previously identified in the first image to estimate the relative position of those objects at its new location, even when those particular objects do not appear in the second image.
  • T B belonging to final hypotheses H B to known obstacles previously identified in the first image to estimate the relative position of those objects at its new location, even when those particular objects do not appear in the second image.
  • Images taken minutes, hours, days or years apart can be processed against current images as long as the images are at least overlapping images of a relatively static scene 305 . Further, it is not necessary for the images to be captured by the same image capturing device 310 . Images captured from multiple image capturing devices can be used as long as they implement the same transformation when capturing the scene into a projection. Also, the two projections for which the feature matching is desired do not have to be captured by the same capturing device at two times, but equivalently by two or more devices at the same time or some may be generated from the apriori known data.
  • image capturing device 310 captures an image of a real-world scene 305 .
  • the processor 320 calculates a match between features from the image and features in the map.
  • the final hypothesis H B then defines the position of the vehicle 300 in the navigation reference frame aligned with the map.
  • the solution of H B is readily applied to the field of automated map building using vehicle 300 or to obtain the matching feature pairs (or larger sets) for triangulation or reprojection purposes, such as for the 3D stereoscopic reprojection.
  • static features identified in one image projection can be correlated to similar features identified in a second image projection in order to combing the two images into a third image that preserves information regarding the relative position of objects in the two images.
  • a map is developed and saved into memory 315 that can serve for various purposes, it might be used, for example, to identify pathways that can be traversed without hindrance from obstacles or it might serve for navigation of other vehicles, etc.
  • the processor 320 applies robust feature matching using mutual relations process 325 to create a mosaic image in memory 315 from separate captured images, by overlapping correlating features from feature set A and onto feature set B.
  • the second useful product stored into the memory is the set of paired features from respective feature sets A and B that explain the hypotheses H B . That is, for each feature of set A that was extracted from a first projection, the process 325 above identifies a corresponding member in feature set B which is linked through a transformation defined by the final hypotheses H B . These pairs are also stored in memory 315 . To the degree hypotheses H B is correct, features of set A, and those features of transformed hypotheses H B , represent the same features of the same physical objects from real world scene 305 . These corresponding members define member pairs which can readily be used to take a particular feature from the first projection and identify where that feature exists in the second projection.
  • these corresponding members can be used to confirm that a physical arrangement of objects in a current image corresponds to a physical arrangement of objects obtained in a previous image using a least squares fit (or similar evaluation) based on the member pairs.
  • a common solution for the transformation T can be enumerated by a least-squares method or other over-determined system parameter estimator.
  • a processor 320 is implemented within vehicle 300 by an FPGA.
  • that FPGA is programmed to carry out robust feature matching using mutual relations process 325 by executing an algorithm that implements the method of FIG. 2 .
  • Computer readable storage media devices include any form of a physical computer memory device. Examples of such a physical computer memory device include, but is not limited to, punch cards, magnetic disks or tapes, optical data storage system, flash read only memory (ROM), non-volatile ROM, programmable ROM (PROM), erasable-programmable ROM (E-PROM), random access memory (RAM), or any other form of permanent, semi-permanent, or temporary memory storage system or device.
  • Program instructions include, but are not limited to computer-executable instructions executed by computer system processors and hardware description languages such as Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL).
  • VHSIC Very High Speed Integrated Circuit
  • VHDL Hardware Description Language

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Automation & Control Theory (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present invention provide improved systems and methods for matching scenes. In one embodiment, a processor for implementing robust feature matching between images comprises: a first process for extracting a first feature set from a first image projection and extracting a second feature set from a second image projection; a memory for storing the first feature set and the second feature set; and a second process for feature matching using invariant mutual relations between features of the first feature set and the second feature set; wherein the second feature set is selected from the second image projection based on the identification of similar descriptive subsets between the second image projection and the first image projection.

Description

CROSS REFERENCES TO RELATED APPLICATIONS
This application is a divisional of pending U.S. patent application Ser. No. 12/644,559 entitled “SYSTEMS AND METHODS FOR MATCHING SCENES USING MUTUAL RELATIONS BETWEEN FEATURES”, filed on Dec. 22, 2009, which is incorporated herein by reference.
BACKGROUND
Typical approaches for matching objects from different images are typically descriptor based. That is, they are designed to detect things containing sufficient information to be considered interesting and unique within the scene. From any such object, features are extracted which allow calculation of a description vector that contains sufficient information to identify the object within the scene. The particular description vectors are selected to be invariant to expected transformations of the scene. That is, for an expected transformation, the derived description vector should remain a valid representation for describing the object, despite the transformation. One fundamental problem with the description vector approach is that to design a description vector that fulfills some desired properties, other must be given up. For example, it is not possible to produce a description vector that is generically invariant to all transformations. Instead, you must choose a particular set of transformations that you want to be invariant to. The more generic the description vector becomes, a greater number of objects in the scene will appear similar. This weakens the ability of a description vector to uniquely describe a single object. The more generic the description vector, the less descriptive it becomes. The design of the description vector must therefore be tuned to a specific situation, rendering its usefulness very limited in other situations. There are cases, thought, when descriptor based methods cannot be used. Other approaches using relational matching techniques as opposed to description vectors have been proposed. To date however, approaches using relational matching techniques have been limited in their ability to reliably handle three-dimensional real world scenes.
For the reasons stated above and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the specification, there is a need in the art for improved systems and methods for matching scenes.
SUMMARY
The Embodiments of the present invention provide methods and systems for matching scenes using mutual relations between features and will be understood by reading and studying the following specification.
Embodiments of the present invention provide improved systems and methods for matching scenes. In one embodiment, a processor for implementing robust feature matching between images comprises: a first process for extracting a first feature set from a first image projection and extracting a second feature set from a second image projection; a memory for storing the first feature set and the second feature set; and a second process for feature matching using invariant mutual relations between features of the first feature set and the second feature set; wherein the second feature set is selected from the second image projection based on the identification of similar descriptive subsets between the second image projection and the first image projection.
DRAWINGS
Embodiments of the present invention can be more easily understood and further advantages and uses thereof more readily apparent, when considered in view of the description of the preferred embodiments and the following figures in which:
FIG. 1 is a block diagram illustrating projection matching using mutual relations of one embodiment of the present invention;
FIG. 2 is a flow chart illustrating a process for robust feature matching using mutual relations between features of one embodiment of the present invention; and
FIG. 3 is a block diagram illustrating one embodiment of the present invention.
In accordance with common practice, the various described features are not drawn to scale but are drawn to emphasize features relevant to the present invention. Reference characters denote like elements throughout figures and text.
DETAILED DESCRIPTION
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of specific illustrative embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical and electrical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense.
Embodiments of the present invention provide systems and methods for matching different projections of a real world scene using a concept referred to herein as the matching of mutual relations between features rather than description vectors that describe the features. Mutual relations suitable for matching may be defined between features that are mutually static over the time in chosen reference frame. For example, when a sensor captures a scene and produces a 3-dimentional projection of the scene (such as a point cloud captured by Light Detection and Ranging (LIDAR) technology device) one mutual relation would be a distance between any two mutually stable points in the projection. That is, a mutual relation will represent a difference in a physical or tangible property with respect to objects in the scene. Sets of mutual relations are then used to determine how objects (or features of objects) are positioned with respect to each other. This is in contrast to the description vector, which only describes how the objects look. For example, a distance between a corner of one object and a corner of another object would represent one mutual relation with respect to those two objects. Differences in temperature indicated by a thermal image of a scene would be a basis for defining a mutual relation between features in another embodiment as could be features such as reflectance and color in still other embodiments. Further, sets of mutual relations may be based on features such as, but not limited to, distances, areas, angles, solid angles, volumes, generalized areas or volumes to higher dimensions, central moments or a histogram. For example, in one embodiment a set of features includes planar features with the angles between normal vectors of the planar features as invariant relations.
Which mutual relations to use in a given application is situation specific for the anticipated transformation that will be applied to capture images and generate the projection. For example, when a LIDAR sensor scans a scene, it generates a 3-dimentional point cloud projection of that scene. When two projections are obtained of a single scene from two different places using LIDAR, the resulting transformation will differ in rotation and translation, but not scale. A camera type image capturing device, in comparison, produces a 2-dimentional image. The 2-dimentional image represents a projection of the scene onto a plane. The projection of the same scene captured twice from different places using a camera will differ—the 3D to 2D projection has infinite number of possibilities and some or all of the features extracted from such projection will differ in shape, size and need not be present in some of the projections at all due to occlusions.
Accordingly, the nature of any property chosen for matching mutual relations of features must consider the transformation applied by the particular sensor used. In addition, the characteristics of the particular situation in which images are being captured must be considered. For example, if you know that your LIDAR sensor is not rotating and is always facing the same direction, you may anticipate that a property such as a difference in object altitude from the ground may form the basis for a mutual relation between objects, as well as a distance between the objects. As long as the property is invariant (i.e., agnostic) to the transformation applied, then that property may be used for matching mutual relations using embodiments of the present invention. Accordingly, embodiments of the present invention are not limited to matching 2dimensional or 3-dimensional projections using cameras and LIDAR sensors. In fact, sensors providing transformations into 1 dimension, 4 dimensions, 5 dimensions, or greater are anticipated as within the scope of embodiments of the present invention.
FIG. 1 is a block diagram illustrating projection matching using mutual relations of one embodiment of the present invention shown generally at 100. An image capturing device 110 captures a first projection (P1) of a real world scene 105. Either at a subsequent point in time or from a different location, image capturing device 110 captures a second projection (P2) the real world scene at 102. Because image capturing device 110 might have moved from the between capturing P1 and P2, these respective projections are assumed to represent different views of real world scene 105.
For projection P1, a feature extraction at 120 extracts features of interest that appear in the projection P1 to form feature set A. Similarly, feature extraction at 122 extracts features of interest that appear in the projection P2 to form feature set B. For example, for 2-dimentional photographic image projections, features of interest may include objects and surfaces such as floors, walls, corners, curved or linear edges of objects, and the like. Defining characteristics of any physical object may be extracted as a feature from a projection. For 3-dimentional LIDAR produced projections, points of a point cloud projection can be defined as features. Alternatively, lines, curves or planes defined by points within the point cloud can also be defined as extractable features for form set A and set B.
Robust feature matching using mutual relations is applied to the features of set A and the features of set B at 130. Unlike matching available in the prior art, robust feature matching at 130 identifies corresponding features from set A that appear in set B using mutual relations between features contained in feature sets A and B. From the features of set A and the features of set B, it is possible to determine how the image capturing device 110 moved from the time it captured P1 to the time it captured P2. The minimum number of features sufficient from each projection P1 and P2 to define the respective transformation associated with the movement of image capturing device 110 is referred to herein as the “minimal subset”. Accordingly, each minimal subset defined for set A includes at least a number n of features from set A where n is based on the number of mutual relations that are mathematically necessary to calculate a transformation between the two projections P1 and P2. This number n is a function of both the particular type of projection utilized by the image capturing device, and the known physical dynamics affecting the image capturing device. In addition, the number n for forming a minimal subset may depend on the type of mutual property that will be used for matching. For example, in the one embodiment where the set of features includes planar features with the angles between normal vectors of the planar features as invariant relations, the number n for forming a minimal subset would be three.
As would be appreciated by one of ordinary skill in the art upon studying this specification, different image sensors technologies will apply different projection transformations to a real world scene. For example, a camera typically projects a real world scene onto a 2-dimentional planar surface. A LIDAR scan projects the same real world scene as a point-cloud into a 3-dimentional space. The particular image sensors technology used by image capturing device 110 at least partially determines how many features are sufficient to define a minimal subset. For example, in the case of the 3-dimentional projection produced by a LIDAR, when 3 features are identified in a projection P1, and the same 3 features can be identified in projection P1, it is possible to use knowledge of the locations of the features(in two distinct camera reference frames) to determine how the LIDAR moved. That is, a pair of 3-point subsets from set A and set B (each point being a feature, for example) is sufficient to estimate a transformation that explains the difference between P1 and P2. In contrast, for an image capturing device that generates 2-dimensional projections onto a plane, a subset of at least 5 features is sufficient. That is, 5 features in the projection P1 that corresponds to 5 features in projection P1 are sufficient to estimate the transformation undergone, or a finite number of its hypotheses.
Physical dynamics affecting image capturing device 110 also play a factor in determining n. For example, in one embodiment the number of extracted features sufficient to form a minimal subset may be reduced if it is known that rotation of the image capture device 110 is not possible. Alternatively, the number of extracted features sufficient to form a minimal subset may be reduced if it is known that the image capture device 110 produces projections that are invariant to scaling or some other effect. The number n of features to include in a minimal subset for a particular application are readily determined by one of ordinary skill in the art upon reading this specification using their knowledge of the particular transformation utilized by the sensor they plan to use and the anticipated application for which the sensor will be used. Utilization of more than n features to form subsets is also contemplated as within embodiments of the present invention. In that case, the term “descriptive subset” is used herein and defined as a subset that contains at least a minimal subset of n features.
In operation, robust feature matching between projections P1 and P2, using mutual relations between features is achieved as follows: Given the set A of features derived from projection 1 (P1), and the set B of features derived from projection 2 (P2), a transformation T will exist between the projections P1 and P2 that is defined, often in an over-determined system, by feature set A and set B. A family of minimal subsets of features c belonging to P1, and a family of minimal subsets of features d belonging to P2 form what is referred to herein as a family of feature pairs ( [c{1},d{1}], [c{2},d{2}], [c{3},d{3}] . . . [c{n},d{n}]). In other words, [c{1}, d{1}] . . . up to [c{n}, d{n}] include all minimal subsets of A and B respectively, belonging to the projections P1 and P2 that define a transformation T between P1 and P2. For example, in one embodiment 50 to 60 minimal subsets are obtained coming from set A and set B, that belong to projection P1 and projection P2. Together they are used to define the transformation T. Mutual relations between features is the technology described herein that determines which feature d{q} from d most likely corresponds with a particular feature c{p} from c. A process of one embodiment of the present invention is described in FIG. 2.
FIG. 2 illustrates a process for robust feature matching using mutual relations between features of one embodiment of the present invention. The process begins at 210 with defining a first descriptive subset (cs1) from a feature set A extracted from a first image projection. In one embodiment, the descriptive subset cs1 is chosen randomly from all the extracted features available for the first image projection in feature set A. In another embodiment, descriptive subset csi is chosen based on a predefined pattern or a non-uniform random distribution that favors desired properties. For example in one embodiment, a descriptive subset csi is selected to include three feature points that are not co-linear. This will avoid defining descriptive subsets that provide singular solutions. Further, in one embodiment descriptive subset csi is selected to avoid defining a subset having features that are close together relative to other available features in the projection to avoid solutions sensitive to noise. In one embodiment, selection of a descriptive feature subset is achieve by a random, pseudorandom, quasi-random or any deterministic process, including a setup where a check for the set suitability is included.
In another embodiment, to aid in the selection of features for minimal subset cs1, mutual relations between features are optionally pre-processed, pre-sorted, or indexed. For example, referring back to FIG. 1, in one embodiment, after feature sets A and B are extracted from projections P1 and P2, one or both are saved as a feature table 145 in a memory 210
In one embodiment, the feature table 145 is optionally sorted so that the mutual relations representing the greatest differential for a selected characteristic appear first in the table. In another embodiment, the feature table 145 is sorted so that sequential sets of features for set A and/or set B are not co-linear. Thus, in one embodiment, selecting the first minimal subset cs1 of features from the first image projection simply comprises sequentially selecting features starting from the top of feature table 145. Using this approach, those minimal subsets most likely to produce accurate solutions with the least noise will be used first. In certain applications, such pre-processing of the mutual relations allows the system to converge at a quality solution more quickly than one based on a pure random selection of features based on a uniform-distribution. In some embodiments, the table of mutual relations is indexed to allow a quick lookup of similar values to speed up the search for the matching minimal subsets. In general, any indexing method used in the database environment may be used.
The process proceeds to 220 with selecting a second descriptive subset (ds1) from a feature set B extracted from a second image projection. The features for second descriptive subset ds1 are selected so that mutual relations between its features are most similar to the mutual relations between the features of descriptive subset cs1. The first and second descriptive subsets are then said to define a pair of descriptive subsets. As used herein, similarity is a measurement of the relation between the features selected above that define the mutual relations between features (for example, distance between features, angles between features, or areas or volumes defined by features). In other words, being similar in the sense used herein means that the pair of descriptive subsets cs1 and ds1 share approximately the same value for the mutual relation properties that are invariant with respect to the transformation T. More than one relation type may be used. For example, in one embodiment where 3-dimentional point-clouds are being matched, one similarity in the sense use herein is defined by traditional geometric triangle congruency. That is, three points in cs1 would each have a mutual relation between them (their distance from each other) that can be used to define three sides of a triangle. In other embodiment, similarity is determined by estimating the variances and/or co-variance of the used invariant properties and using such information to determine the feature correspondences using a statistical test.
Using this information, selecting a corresponding minimal subset from feature set B, in one embodiment, would comprise finding a descriptive subset ds1 that defines a triangle having dimensions approximately the same as the triangle defined by cs1. In another embodiment, the area of the triangle defined by the minimal subset could be used as a single scalar similarity parameter. That is, an alternative similarity measure for determining a descriptive subset ds1 is be based on finding a triangle from the features of set B having an area similar to the area of the triangle defined by descriptive subset cs1. Congruency based on other shapes, areas and/or volumes definable from the values held in a descriptive subset can form the basis for determining a similarity of mutual relations.
As would be appreciated by one of ordinary skill in the art upon reading this specification, it is not necessary that an exact match between cs1 and ds1 must be found. In practice, that will seldom happen. Instead, a value is “close enough” to be considered a match between any descriptive subsets csx and dsy whose parameters are similar enough to be considered congruent. Multiple means for determining such geometric congruency are known and may be readily applied by one of ordinary skill in the art reading this specification. For that reason, they are not repeated here.
In some circumstances, more than one minimal subset from set B may be found that is congruent to descriptive subset cs1. In one embodiment, the descriptive subset that is the most congruent with descriptive subset cs1 is selected as descriptive subset ds1. However, when multiple candidates for ds1 meet a given congruency criteria, it is best to consider all of the candidates, each paired with descriptive subset cs1 in the process described in FIG. 2. As explained above, pre-filtering all the possible triangles in set B by their area into a lookup table 145 decreases the number of combinations that have to be considered to find a match for descriptive subset cs1. Matching based on a mutual relation such as point distances involves more calculation and in some embodiments, it is therefore performed on the pre-filtered data set known to already fulfill the simpler—scalar—relation similarity.
The method proceeds to 230 with calculating a transformation T from the pair of similar descriptive subsets chosen above to form a hypothesis H. For example, for the descriptive subset pair cs1 and ds1 determined above, the transformation T forms the basis for a hypothesis H{cs1, ds1}. In one embodiment, using the properties used to determine similarity, transformation T is calculated using a method derived from the specific situation. As would be appreciated by one of ordinary skill in the art upon reading this specification, the transformation derived is not necessarily a linear operation, but may also be a combination of several linear operations (rotation +translation), for example. In some cases, it may be linear but in the general case a linear operation is not necessarily expected. As would be appreciated by one of ordinary skill in the art upon studying this specification, there are several methods known for calculating a transformation T from the pair of similar minimal subsets chosen above to form the hypothesis H. References that describe such calculations include: D. Nistér, An efficient solution to the five-point relative pose problem, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(6):756-770, Jun. 2004; Hartley, R.I., In defence of the 8-point algorithm, proceedings, Fifth international Conference on Computer Vision, 1064-1070, Jun. 1995; and Horn, Berthold K. P., Closed-form solution of absolute orientation using unit quaternions, Journal of the Optical Society of America A, Vol. 4, page 629, Apr. 1987, which are each incorporated herein by reference.
Solving for transformation T forms the basis for a first hypothesis H. That is, H describes a hypothesis that describes movement of the image capturing sensor 110 between the capture of projections P1 and P2, along with the two sets of data that support this transformation hypothesis, as described below. When an estimate csp′ is calculated using the hypothetic transformation (for example, csp′=T(csp)) the more accurately transformation T reflects the actual movement of the image capturing sensor, the more closely csp′ will equal dsq.
Thus, the method proceeds to 240 with evaluating the quality of hypothesis H. Various means are known to those of ordinary skill in the art for determining the reprojection hypothesis quality. In one embodiment, a mean (squared) reprojection error of the features in the scene is determined for different hypotheses H and compared. In another embodiment, the quality of hypothesis H is determined from the relative number of inlier points that appear when the hypothetic transformation T is applied to either part of, or the entire feature set A. That is, in one embodiment, an A′ is calculated using A′=T(A) and the similarity between A′ and B is determined. The closer A′ resembles B, the better the quality of H. The choice of a particular metric used to determine the quality of H for embodiments of the present invention is selected based on the specific projection type, type of features in the set A, and/or the transformation undergone; in general it allows the relative quality of different H's to be compared against each other. In one embodiment, each hypothesis His stored into memory. The method then proceeds to 250 with storing a final hypothesis transformation HB into a memory device. The final transformation HB represents the best estimated hypothesis H based on the quality metric.
As shown at 260, blocks 210 to 240 are iteratively repeated each time finding at least one similar descriptive subset dsq in set B for every selected descriptive subset csp from set A, and deriving from each resulting descriptive subset pair [csp, dsq] a hypothetic transformation T{ csp, dsq }, further forming the hypothesis H{csp, dsq}, together with those features from sets C and D, that support this hypothetic transformation. As would be appreciated by one of ordinary skill in the art, general speaking, a process for repeatedly generating and testing hypotheses using evaluation by the other set members, as described in blocks 210-240 is sometimes referred to as a RanSaC estimation method. The H{csp, dsq} found from the iterations having the best quality as determined block 240 is stored into the memory device as HB.
The method above produces two very useful products. The first product is an output in the form of the best hypothesis HB stored into the memory device. The final hypothesis HB defines a transformation TB that represent a best estimation of how an image sensor moved. As such, TB may be used to transform the features appearing in P1 into P2, or match features appearing in P1 with corresponding features in P2. In other words, an object identified in the first image can be identified and located in the second image using HB. As an example, in case of 3D to 2D perspective projection (matching of camera images), by applying the best hypothetic transformation TB on A, any feature of set A will produce a projection A′ and the number of those projections A′ that line up with a epipolar lines belonging to corresponding features in set B will be maximized.
In one embodiment, the solution for HB is applied to the practical field of self navigating vehicles. FIG. 3 is a block diagram illustrating one embodiment of the present invention. In this embodiment, a self navigating vehicle 300 includes an image capturing device 310 and a processor 320 implementing a robust feature matching using mutual relations process 325, such as described above with respect to FIG. 2.
In one embodiment, in operation, image capturing device 310 captures a first image of a real-world scene 305. In one embodiment, vehicle 300 has at least approximate knowledge of its own coordinates with respect to the reference frame of the real-world scene 304 as it captures the first image. Vehicle 300 subsequently travels to a second location and captures a second image of the real-world scene 305. From the first and second image, processor 320 calculates and stores into memory 315 a final hypotheses HB In one embodiment, vehicle 300 then determines coordinates for its new position by applying the transformation TB belonging to the best hypothesis HB to its coordinates in the navigation frame. The difference in coordinates can also be used to determine vehicle parameters such as, but not limited to, vehicle 300's velocity (when time between image captures is known), heading, and orientation (i.e., yaw, pitch and roll). In another embodiment, vehicle 300 applies the transformation TB belonging to final hypotheses HB to known obstacles previously identified in the first image to estimate the relative position of those objects at its new location, even when those particular objects do not appear in the second image. As this suggests, it is not necessary for the two images used for determining hypotheses HB to be completely overlapping images of a scene, as long as common features can be extracted from the two images. It is also not necessary for the two images used for determining hypotheses HB to be sequentially taken. Images taken minutes, hours, days or years apart can be processed against current images as long as the images are at least overlapping images of a relatively static scene 305. Further, it is not necessary for the images to be captured by the same image capturing device 310. Images captured from multiple image capturing devices can be used as long as they implement the same transformation when capturing the scene into a projection. Also, the two projections for which the feature matching is desired do not have to be captured by the same capturing device at two times, but equivalently by two or more devices at the same time or some may be generated from the apriori known data.
In another embodiment, in operation, image capturing device 310 captures an image of a real-world scene 305. Using apriori given map of features, the processor 320 calculates a match between features from the image and features in the map. The final hypothesis HB then defines the position of the vehicle 300 in the navigation reference frame aligned with the map.
In another embodiment, the solution of HB is readily applied to the field of automated map building using vehicle 300 or to obtain the matching feature pairs (or larger sets) for triangulation or reprojection purposes, such as for the 3D stereoscopic reprojection. For example, with alternate embodiments of the present invention, static features identified in one image projection can be correlated to similar features identified in a second image projection in order to combing the two images into a third image that preserves information regarding the relative position of objects in the two images. By repeating this process, as vehicle 300 travels, a map is developed and saved into memory 315 that can serve for various purposes, it might be used, for example, to identify pathways that can be traversed without hindrance from obstacles or it might serve for navigation of other vehicles, etc. Similarly, in other embodiments, the processor 320 applies robust feature matching using mutual relations process 325 to create a mosaic image in memory 315 from separate captured images, by overlapping correlating features from feature set A and onto feature set B.
The second useful product stored into the memory is the set of paired features from respective feature sets A and B that explain the hypotheses HB. That is, for each feature of set A that was extracted from a first projection, the process 325 above identifies a corresponding member in feature set B which is linked through a transformation defined by the final hypotheses HB. These pairs are also stored in memory 315. To the degree hypotheses HB is correct, features of set A, and those features of transformed hypotheses HB , represent the same features of the same physical objects from real world scene 305. These corresponding members define member pairs which can readily be used to take a particular feature from the first projection and identify where that feature exists in the second projection. Similarly, these corresponding members can be used to confirm that a physical arrangement of objects in a current image corresponds to a physical arrangement of objects obtained in a previous image using a least squares fit (or similar evaluation) based on the member pairs. For those members of the two feature sets that mutually supported the best hypothesis, a common solution for the transformation T can be enumerated by a least-squares method or other over-determined system parameter estimator.
Several means are available to implement the systems and methods of the current invention as discussed in this specification. These means include, but are not limited to, digital computer systems, microprocessors, general purpose computers, programmable controllers and field programmable gate arrays (FPGAs). For example, in one embodiment, a processor 320 is implemented within vehicle 300 by an FPGA. In one embodiment, that FPGA is programmed to carry out robust feature matching using mutual relations process 325 by executing an algorithm that implements the method of FIG. 2.
Therefore other embodiments of the present invention are program instructions resident on computer readable storage media devices which when implemented by such means enable them to implement embodiments of the present invention. Computer readable storage media devices include any form of a physical computer memory device. Examples of such a physical computer memory device include, but is not limited to, punch cards, magnetic disks or tapes, optical data storage system, flash read only memory (ROM), non-volatile ROM, programmable ROM (PROM), erasable-programmable ROM (E-PROM), random access memory (RAM), or any other form of permanent, semi-permanent, or temporary memory storage system or device. Program instructions include, but are not limited to computer-executable instructions executed by computer system processors and hardware description languages such as Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL).
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement, which is calculated to achieve the same purpose, may be substituted for the specific embodiment shown. This application is intended to cover any adaptations or variations of the present invention. Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.

Claims (9)

What is claimed is:
1. A navigation system for a vehicle, the navigation system comprising:
an image capturing device that produces projections of surrounding scenes;
a processor coupled to the image capturing device, the processor having a process for robust feature matching using invariant mutual relations between features; and
a memory coupled to the processor, wherein the processor:
defines a first descriptive subset of features, including at least one minimal subset, from a feature set A extracted by at least one processor from a first image projection,
selects at least two second descriptive subsets from a feature set B extracted by the at least one processor from a second image projection, the at least two second descriptive subsets correlated to at least two minimal subsets, which meet a congruency criteria with the first descriptive subset of features from the feature set A, wherein the at least two second descriptive subsets are selected based on a similarity of invariant mutual relations between features of one of the at least one minimal subset in the first descriptive subset and invariant mutual relations between features of the at least two minimal subsets in the at least two second descriptive subsets, wherein the first descriptive subset and the at least two second descriptive subsets define at least two pairs of similar descriptive subsets,
calculates at the at least one processor a transformation T from at least one of the at least two pairs of similar descriptive subsets to form a hypothesis H,
evaluates at the at least one processor a quality of the hypothesis H,
stores the hypothesis H into a memory device, the hypothesis H including transformation T and a plurality of paired features that support the hypothesis H,
selects a final hypothesis HB having a high quality as determined by a quality metric, the method further comprising at least one of:
determining coordinates for a new position by applying the transformation T belonging to the selected final hypothesis HB to coordinates in a navigation frame; and
generating a map in the memory device for navigating a vehicle, the map based on matching features from the first image projection with the second image projection.
2. The navigation system of claim 1, wherein the processor obtains at least one of the first projection and the second projection from projections captured by the image capturing device.
3. The navigation system of claim 1, wherein the processor: defines a plurality of descriptive subsets of features from the feature set A;
selects, for each of the plurality of descriptive subsets from the feature set A, at least two descriptive subsets from the feature set B based on a similarity of invariant mutual relations between features to form a plurality of similar descriptive subset pairs;
calculates a hypothetic transformation T for each of the plurality of similar descriptive subset pairs to form at least one hypothesis H; and
stores into the memory a plurality of paired features that support the final hypothesis HB, where the plurality of paired features include a feature from the feature set A and a feature from the feature set B.
4. The navigation system of claim 1, wherein the processor determines a distance to an object based on a best transformation TB and plurality of paired features that support the final hypothesis HB.
5. The navigation system of claim 1, wherein the processor
determines a number n of features to include in the first descriptive subset based on:
a type of projection utilized in the image capturing device to generate the first image projection from a real-world scene; and
an anticipated transformation undergone between the first image projection and the second image projection.
6. The navigation system of claim 1, wherein the similarity of invariant mutual relations between features is based on a geometric congruency.
7. A processor for implementing robust feature matching between images, the processor comprising:
a first process for extracting a feature set A from a first image projection and extracting a feature set B from a second image projection;
a memory for storing the feature set A and the feature set B;
a second process for feature matching, the second process including:
defines a first descriptive subset of features, including at least one minimal subset, from the feature set A extracted,
selects at least two second descriptive subsets from the feature set B, the at least two second descriptive subsets correlated to at least two minimal subsets, which meet a congruency criteria with the first descriptive subset of features from the feature set A, wherein the at least two second descriptive subsets are selected based on a similarity of invariant mutual relations between features of one of the at least one minimal subset in the first descriptive subset and invariant mutual relations between features of the at least two minimal subsets in the at least two second descriptive subsets, wherein the first descriptive subset and the at least two second descriptive subsets define at least two pairs of similar descriptive subsets,
calculates at the at least one processor a transformation T from at least one of the at least two pairs of similar descriptive subsets to form a hypothesis H,
evaluates at the at least one processor a quality of the hypothesis H,
stores the hypothesis H into a memory device, the hypothesis H including transformation T and a plurality of paired features that support the hypothesis H,
selects a final hypothesis HB having a high quality as determined by a quality metric, the method further comprising at least one of:
determining coordinates for a new position by applying the transformation T belonging to the selected final hypothesis HB to coordinates in a navigation frame; and
generating a map in the memory device for navigating a vehicle, the map based on matching features from the first image projection with the second image projection.
8. The processor of claim 7, wherein the second process pre-processes one or both of the feature set A and the feature set B into a feature table sorted such that invariant mutual relations representing a greatest differential for a selected characteristic appear first in the feature table.
9. The processor of claim 7, wherein the feature set A and the feature set B include planar features with invariant mutual relations including angles between normal vectors of the planar features; and
wherein the similar descriptive subsets are based on minimal subsets of three features.
US14/736,738 2009-12-22 2015-06-11 Systems and methods for matching scenes using mutual relations between features Active US9449431B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/736,738 US9449431B2 (en) 2009-12-22 2015-06-11 Systems and methods for matching scenes using mutual relations between features

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/644,559 US9091553B2 (en) 2009-12-22 2009-12-22 Systems and methods for matching scenes using mutual relations between features
US14/736,738 US9449431B2 (en) 2009-12-22 2015-06-11 Systems and methods for matching scenes using mutual relations between features

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/644,559 Division US9091553B2 (en) 2009-12-22 2009-12-22 Systems and methods for matching scenes using mutual relations between features

Publications (2)

Publication Number Publication Date
US20150279099A1 US20150279099A1 (en) 2015-10-01
US9449431B2 true US9449431B2 (en) 2016-09-20

Family

ID=43797929

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/644,559 Active 2032-11-01 US9091553B2 (en) 2009-12-22 2009-12-22 Systems and methods for matching scenes using mutual relations between features
US14/736,738 Active US9449431B2 (en) 2009-12-22 2015-06-11 Systems and methods for matching scenes using mutual relations between features

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/644,559 Active 2032-11-01 US9091553B2 (en) 2009-12-22 2009-12-22 Systems and methods for matching scenes using mutual relations between features

Country Status (2)

Country Link
US (2) US9091553B2 (en)
EP (1) EP2339294B1 (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8660365B2 (en) 2010-07-29 2014-02-25 Honeywell International Inc. Systems and methods for processing extracted plane features
US8723886B2 (en) * 2010-09-08 2014-05-13 Navteq B.V. Generating a multi-layered geographic image and the use thereof
US8521418B2 (en) 2011-09-26 2013-08-27 Honeywell International Inc. Generic surface feature extraction from a set of range data
US20130083201A1 (en) * 2011-10-03 2013-04-04 Raytheon Company Methods and apparatus for determining misalignment of first and second sensors
US9476970B1 (en) * 2012-03-19 2016-10-25 Google Inc. Camera based localization
US9082008B2 (en) 2012-12-03 2015-07-14 Honeywell International Inc. System and methods for feature selection and matching
CN103900583B (en) * 2012-12-25 2018-02-27 联想(北京)有限公司 For positioning the apparatus and method with map structuring immediately
US9400939B2 (en) * 2014-04-13 2016-07-26 International Business Machines Corporation System and method for relating corresponding points in images with different viewing angles
CN105224582B (en) * 2014-07-03 2018-11-09 联想(北京)有限公司 Information processing method and equipment
EP2983131A1 (en) * 2014-08-06 2016-02-10 Thomson Licensing Method and device for camera calibration
US11060979B2 (en) * 2014-12-19 2021-07-13 General Electric Company System and method for engine inspection
EP3182371B1 (en) * 2015-12-17 2018-09-26 Stmicroelectronics Sa Threshold determination in for example a type ransac algorithm
EP3185212B1 (en) 2015-12-17 2019-07-03 STmicroelectronics SA Dynamic particle filter parameterization
EP3182370B1 (en) 2015-12-17 2020-07-29 STmicroelectronics SA Method and device for generating binary descriptors in video frames
US10917559B2 (en) 2016-10-18 2021-02-09 Huawei Technologies Co., Ltd. Method for achieving non-selfie-taking effect through selfie-taking and photographing device
CN112020630B (en) * 2018-04-27 2024-06-28 北京嘀嘀无限科技发展有限公司 System and method for updating 3D models of buildings
CN110763194B (en) * 2019-11-01 2021-11-02 中国矿业大学 Method for monitoring surface subsidence of mining area by ground three-dimensional laser scanning and without target

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050069173A1 (en) 2003-08-29 2005-03-31 Sony Corporation Direction-recognizing apparatus, direction-recognizing method, direction-recognizing system, and robot apparatus
US20060088203A1 (en) 2004-07-14 2006-04-27 Braintech Canada, Inc. Method and apparatus for machine-vision
US20060204079A1 (en) 2005-03-10 2006-09-14 Kabushiki Kaisha Toshiba Pattern recognition apparatus and method
US20060221072A1 (en) 2005-02-11 2006-10-05 Se Shuen Y S 3D imaging system
US20090262206A1 (en) * 2008-04-16 2009-10-22 Johnson Controls Technology Company Systems and methods for providing immersive displays of video camera information from a plurality of cameras
US9091533B2 (en) * 2008-06-06 2015-07-28 Isra Vision Ag Optical surveillance device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050069173A1 (en) 2003-08-29 2005-03-31 Sony Corporation Direction-recognizing apparatus, direction-recognizing method, direction-recognizing system, and robot apparatus
US20060088203A1 (en) 2004-07-14 2006-04-27 Braintech Canada, Inc. Method and apparatus for machine-vision
US20060221072A1 (en) 2005-02-11 2006-10-05 Se Shuen Y S 3D imaging system
US20060204079A1 (en) 2005-03-10 2006-09-14 Kabushiki Kaisha Toshiba Pattern recognition apparatus and method
US20090262206A1 (en) * 2008-04-16 2009-10-22 Johnson Controls Technology Company Systems and methods for providing immersive displays of video camera information from a plurality of cameras
US9091533B2 (en) * 2008-06-06 2015-07-28 Isra Vision Ag Optical surveillance device

Non-Patent Citations (18)

* Cited by examiner, † Cited by third party
Title
European Patent Office, "Communication pursuant to Article 94(3) EPC from EP Application No. 10192461.7 mailed Jun. 14, 2016", "from Foreign Counterpart of U.S. Appl. No. 14/736,738", Jun. 14, 2016, pp. 16, Published in: EP.
European Patent Office, "European Search Report from EP Application No. 10192463.7 mailed Apr. 15, 2011 mailed Apr. 15, 2011", "from Foreign Counterpart of U.S. Appl. No. 12/644,559", Apr. 15, 2011, pp. 1-4, Published in: EP.
European Patent Office, "Office Action from EP Application No. 10192463.7 mailed Aug. 5, 2011", "from Foreign Counterpart of U.S. Appl. No. 12/644,559", Aug. 5, 2011, pp. 1-5, Published in: EP.
Goshtasby et al., "Point Pattern Matching Using Convex Hull Edges", "IEEE Transctions on Systems, Man, and Cybernetics", Sep./Oct. 1985, pp. 631-637, vol. 15, No. 5.
Horn et al., "Closed-Form Solution of Absolute Orientation Using Unit Quaternions", "Journal of the Optical Society of America", Apr. 1987, pp. 629-642, vol. 4, Publisher: Optical Society of America.
Irschara et al., "From Structure-From-Motion Point Clouds to Fast Location Recognition", "IEEE Computer Society Conference on Computer Vision and Pattern Recognition", Apr. 18, 2009, pp. 1-8.
Jean et al., "Trajectories Normalization for Viewpoint Invariant Gait Recognition", Dec. 8, 2008, pp. 1-4, Publisher: IEEE.
Nister, "An Efficient Solution to the Five-Point Relative Pose Problem", "Pattern Analysis and Machine Intelligence", Jun. 2004, pp. 1-17, vol. 26, No. 6, Publisher: IEEE.
R. Hartley, "In Defense of the Eight-Point Algorithm", "IEEE Transactions on Pattern Analysis and Machine Intelligence", Jun. 1997, pp. 580-593, vol. 19, No. 6, Publisher: IEEE.
Rodrigo et al, "Robust and Efficient Feature Tracking for Indoor Navigation", "IEEE Transactions on Systems. Man.and Cybernetics-Part B: Cybernetics", Jun. 2009, pp. 658-671, vol. 39, No. 3, Publisher: IEEE.
U.S. Patent and Trademark Office, "Final Office Action", "from U.S. Appl. No. 12/644,559", Mar. 25, 2014, pp. 1-15, Published in: US.
U.S. Patent and Trademark Office, "Notice of Allowance", "from U.S. Appl. No. 12/644,559", Apr. 27, 215, pp. 1-9, Published in: US.
U.S. Patent and Trademark Office, "Office Action", "from U.S. Appl. No. 12/644,559", Oct. 2, 2014, pp. 1-29.
U.S. Patent and Trademark Office, "Office Action", "U.S. Appl. No. 12/644,559", Sep. 25, 2012, pp. 1-18, Published in: US.
U.S. Patent and Trademark Office, "Restriction Requirement", "from U.S. Appl. No. 12/644,559", Aug. 9, 2012, pp. 1-7.
United States Patent and Trademark Office, "Advisory Action", "U.S. Appl. No. 12/644,559", Jun. 16, 2014, pp. 1-3, Published in: US.
Vedaldi et al., "On Viewpoint Invariance for Non-Planar Scenes", "UCLA CSD Technical Report #TR050012", Mar. 3, 2006, pp. 1-22.
Wu et al, "3D Model Matching With Viewpoint-Invariant Patches (VIP)", "2008 IEEE Conference on Computer Vision and Patten Recognition", Jun. 23-28, 2008, pp. 1-8.

Also Published As

Publication number Publication date
US20150279099A1 (en) 2015-10-01
EP2339294A1 (en) 2011-06-29
EP2339294B1 (en) 2018-10-31
US9091553B2 (en) 2015-07-28
US20110153206A1 (en) 2011-06-23

Similar Documents

Publication Publication Date Title
US9449431B2 (en) Systems and methods for matching scenes using mutual relations between features
US8238612B2 (en) Method and apparatus for vision based motion determination
Weber et al. Automatic registration of unordered point clouds acquired by Kinect sensors using an overlap heuristic
US9525862B2 (en) Method for estimating a camera motion and for determining a three-dimensional model of a real environment
US9761008B2 (en) Methods, systems, and computer readable media for visual odometry using rigid structures identified by antipodal transform
Bethmann et al. Object-based multi-image semi-global matching–concept and first results
Lane et al. Tutorial: Overview of stereo matching research
Bhowmick et al. Mobiscan3D: A low cost framework for real time dense 3D reconstruction on mobile devices
Harms et al. Accuracy analysis of surface normal reconstruction in stereo vision
Budge et al. Automatic registration of fused lidar/digital imagery (texel images) for three-dimensional image creation
Ward et al. A model-based approach to recovering the structure of a plant from images
Ahuja et al. Three‐dimensional Scan Registration using Curvelet Features in Planetary Environments
Sun et al. Scale‐aware camera localization in 3D LiDAR maps with a monocular visual odometry
Nakayama et al. Marker-less augmented reality framework using on-site 3D line-segment-based model generation
Yun et al. 3D scene reconstruction system with hand-held stereo cameras
Fox et al. An approach to stereo-point cloud registration using image homographies
Nel et al. Markerless monocular vision-based localisation for autonomous inspection drones
Yoon et al. Stereo Vision SLAM with SuperPoint and SuperGlue
Bethmann et al. Multi-image semi-global matching in object space
Bond et al. Efficient Scene Reconstruction for Unmanned Aerial Vehicles
Yamaguchi et al. Point Cloud Reconstruction of Textureless Regions with Topology Constraints between Corresponding Points
Sethuraja Systematic workflow for low-cost near real-time 3d reconstruction of disaster zones in mixed reality
Zhan et al. A Scaled Monocular 3D Reconstruction Based on Structure from Motion and Multi-View Stereo.
Salvi et al. Overview of surface registration techniques including loop minimization for three-dimensional modeling and visual inspection
Aguilar Calzadillas Sparse Stereo Visual Odometry with Local Non-Linear Least-Squares Optimization for Navigation of Autonomous Vehicles

Legal Events

Date Code Title Description
AS Assignment

Owner name: HONEYWELL INTERNATIONAL INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOTABA, ONDREJ;LUKAS, JAN;REEL/FRAME:035823/0478

Effective date: 20091222

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8