US9396396B2 - Feature value extraction apparatus and place estimation apparatus - Google Patents

Feature value extraction apparatus and place estimation apparatus Download PDF

Info

Publication number
US9396396B2
US9396396B2 US14/440,768 US201314440768A US9396396B2 US 9396396 B2 US9396396 B2 US 9396396B2 US 201314440768 A US201314440768 A US 201314440768A US 9396396 B2 US9396396 B2 US 9396396B2
Authority
US
United States
Prior art keywords
feature value
invariant
feature values
invariant feature
values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US14/440,768
Other languages
English (en)
Other versions
US20150294157A1 (en
Inventor
Osamu Hasegawa
Gangchen Hua
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Soinn Inc
Original Assignee
Soinn Holdings LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Soinn Holdings LLC filed Critical Soinn Holdings LLC
Assigned to TOKYO INSTITUTE OF TECHNOLOGY reassignment TOKYO INSTITUTE OF TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HASEGAWA, OSAMU, HUA, Gangchen
Publication of US20150294157A1 publication Critical patent/US20150294157A1/en
Assigned to SOINN HOLDINGS LLC reassignment SOINN HOLDINGS LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TOKYO INSTITUTE OF TECHNOLOGY
Application granted granted Critical
Publication of US9396396B2 publication Critical patent/US9396396B2/en
Assigned to SOINN INC. reassignment SOINN INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SOINN HOLDINGS LLC
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G06K9/00691
    • G06K9/00704
    • G06K9/4671
    • G06K9/6211
    • G06T7/2033
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/38Outdoor scenes
    • G06V20/39Urban scenes
    • G06K2209/29
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/11Technique with transformation invariance effect

Definitions

  • the present invention relates to a feature value extraction apparatus, a method, and a program for extracting local feature values from an input image, and a place estimation apparatus, a method, and a program using them.
  • Estimating/specifying one's own position (place) in an environment is an ability indispensable for a person or a machine. It is always important for a robot or a computer vision to recognise where it is located. In particular, in the case of a movable robot, recognizing where the robot itself is located is a fundamental requirement of its navigation system.
  • the present invention has been made to solve the above-described problem and an object thereof is to provide a feature value extraction apparatus, a method, and a program capable of extracting local feature values whose positions are unchanged, and a place estimation apparatus, a method, and a program equipped with them.
  • a place estimation apparatus includes: feature value extraction means for extracting a position-invariant feature value from an input image; matching means for referring to a database and obtaining matching between the input image and a registered place, the database storing each registered place and its associated position-invariant feature value; similarity-level calculation means for calculating a similarity level in which a registered place near a selected registered place is included in the calculation when the matching is equal to or higher than a predetermined threshold; and place recognition means for recognizing that the input image is the registered place when the similarity level is equal to or higher than a predetermined threshold, in which the feature value extraction means includes: local feature value extraction means for extracting a local feature value from each of input images formed from successively-shot successive images; feature value matching means for obtaining matching between successive input images for the local feature value extracted by the local feature value extraction means; corresponding feature value selection means for selecting a feature value, for which the matching is obtained between the successive images by the feature value matching means, as a corresponding feature value; and position-in
  • a place estimation method includes: a feature value extraction step for extracting an invariant feature value from an input image; a matching step for referring to a database and obtaining matching between the input image and a registered place, the database storing each registered place and an invariant feature value while associating them with each other; a similarity-level calculation step for calculating a similarity level in which a registered place near a selected registered place is included in the calculation when the matching is equal to or higher than a predetermined threshold; and a place recognition step for recognizing that the input image is the registered place when the similarity level is equal to or higher than a predetermined threshold, in which the feature value extraction step includes: a local feature value extraction step for extracting a local feature value from each of input images formed from successively-shot successive images; a feature value matching step for obtaining matching between successive input images for the local feature value extracted in the local feature value extraction step; a corresponding feature value selection step for selecting a feature value, for which the matching is obtained between the successive images in the feature
  • a feature value extraction apparatus includes: local feature value extraction means for extracting a local feature value from each of input images formed from successively-shot successive images; feature value matching means for obtaining matching between successive input images for the local feature value extracted by the local feature value extraction means; corresponding feature value selection means for selecting a feature value, for which the matching is obtained between the successive images by the feature value matching means, as a corresponding feature value; and position-invariant feature value extraction means for obtaining a position-invariant feature value based on the corresponding feature value, and the position-invariant feature value extraction means extracts, from among the corresponding feature values, a corresponding feature value whose position change is equal to or less than a predetermined threshold as the position-Invariant feature value.
  • a feature value extraction method includes: a local feature value extraction step for extracting a local feature value from each of input images formed from successively-shot successive images; a feature value matching step for obtaining matching between successive input images for the local feature value extracted in the local feature value extraction step; a corresponding feature value selection step for selecting a feature value, for which the matching is obtained between the successive images in the feature value matching step, as a corresponding feature value; and position-invariant feature value extraction step for obtaining a position-invariant feature value based on the corresponding feature value, and in the position-invariant feature value extraction step, a corresponding feature value whose position change is equal to or less than a predetermined threshold is extracted from among the corresponding feature values as the position-invariant feature value.
  • a program according to the present invention is a program for causing a computer to execute the above-described place estimation method or the feature value extraction method.
  • a feature value extraction apparatus capable of extracting local feature values whose positions are unchanged as robust feature values
  • a place estimation apparatus capable of extracting local feature values whose positions are unchanged as robust feature values
  • FIG. 1 is a block diagram showing a place estimation apparatus according to an exemplary embodiment of the present invention
  • FIG. 2 is a flowchart showing a place estimation method according to an exemplary embodiment of the present invention:
  • FIG. 3 is a flowchart showing a position-invariant feature value extraction process
  • FIG. 4 is a list showing a position-invariant feature value extraction process
  • FIG. 5 shows an ICGM in a one-way approach
  • FIG. 6 shows an ICGM in a both-way approach
  • FIG. 7 is a graph showing a comparison between a one-way approach and a both-way approach
  • FIG. 8 is a graph showing a comparison between a one-way approach and a both-way approach
  • FIG. 9 shows feature value extraction experiment result by an ICGM:
  • FIG. 10 shows a place recognition experiment in Shibuya train station
  • FIG. 11 is a graph showing a comparison between a one-way approach and a both-way approach
  • FIG. 12 shows a Minamidai outdoor experiment
  • FIG. 13 shows a result of the Minamidai outdoor experiment
  • FIG. 14 shows a result of the Minamidai outdoor experiment
  • FIG. 15 shows a corresponding feature value extraction process
  • FIG. 16 shows a corresponding feature value extraction process
  • FIG. 17 shows a position-invariant feature value extraction process
  • FIG. 18 is a list of a position-invariant feature value extraction process.
  • FIG. 19 is a list of a position-invariant feature value extraction process.
  • a technique in which feature values whose positions are unchanged over a long period in an environment, i.e., position-invariant feature values are extracted and they are used for place estimation is disclosed.
  • static local feature values i.e., feature values whose positions are unchanged over a long period in an environment.
  • the positions of feature values of these pedestrians usually change in a short time, they are not regarded as static feature values.
  • the positions of feature values related to elements such as walls and signboards do not change over a long period. It is desirable to use such position-invariant feature values for place estimation.
  • the present invention is applied to a place estimation apparatus for estimating a place that is incorporated into a moving-type robot apparatus or the like.
  • FIG. 1 is a block diagram showing a place estimation apparatus according to an exemplary embodiment of the present invention.
  • the place estimation apparatus 10 includes a feature value extraction unit 11 that extracts position-invariant feature values from input images consisting of successively-shot successive images, a common dictionary 12 , a matching unit 13 , a similarity-level calculation unit 14 , and a place recognition unit 15 .
  • the feature value extraction unit 11 includes a local feature value extraction unit 21 , a feature value matching unit 22 , a corresponding feature value selection unit 23 , and a position-invariant feature value extraction unit 24 .
  • the local feature value extraction unit 21 extracts local feature values from each of the input images.
  • the feature value matching unit 22 obtains matching between successive input images for the local feature values extracted by the local feature value extraction unit 21 .
  • the corresponding feature value selection unit 23 extracts feature values for which matching between the successive images has been obtained by the feature value matching unit as corresponding feature values. It is assumed in this exemplary embodiment that the feature value matching unit 22 and the corresponding feature value selection unit 23 obtain corresponding feature values by using two successive images. Examples of the technique for extracting corresponding feature values include a SIFT (Scale Invariant Feature Transformation) and SURF (Speed Up Robustness Features).
  • SIFT Scale Invariant Feature Transformation
  • SURF Speed Up Robustness Features
  • the position-invariant feature value extraction unit 24 which is a processing unit that carries out a characteristic process of the present invention, extracts, from among the corresponding feature values extracted by the corresponding feature value selection unit 23 , only the feature values whose positions are unchanged (position-invariant feature values).
  • ICGM Intelligent Center of Gravity Matching
  • the matching unit 13 refers to a database in which places and their position-invariant feature values are registered in a state where the places are associated with their respective position-invariant feature values, performs matching between an input image and a registered place, and calculates a matching score.
  • the similarity-level calculation unit 14 calculates a similarity level in which a registered place(s) near the selected registered place is included in the calculation when the matching score is equal or higher than a predetermined threshold.
  • the place recognition unit 15 recognizes that the input image is an image of the registered place when the similarity level is a predetermined threshold.
  • FIG. 2 is a flowchart showing a place estimation method according to this exemplary embodiment.
  • two successively-shot images I t and I t ⁇ 1 are input to the local feature value extraction unit 21 .
  • successive images required in the ICGM are, for example, images that are successively shot at a predetermined frame rate (e.g., two frames per second).
  • images captured from video images are successive images. Therefore, video images are preferably used as input images in the ICGM.
  • the local feature value extraction unit 21 extracts local feature values by using an existing local feature value extraction method (step S 1 ).
  • the local feature value extraction unit 21 can use a feature value extraction method such as a SIFT (Scale Invariant Feature Transformation) or SURF (Speed Up Robustness Features).
  • SIFT Scale Invariant Feature Transformation
  • SURF Speed Up Robustness Features
  • local feature values other than the SIFT and the SURF can also be used.
  • other local feature values that are robust against scaling, rotations variations, noises, or the like are preferably used.
  • properties of existing feature values are taken over as they are, thus making it possible to extract/describe as features robust against illumination changes and the like.
  • the SURF is used in this exemplary embodiment.
  • 2,000 to 3,000 feature values or larger are extracted as local feature values.
  • the calculation amount is small.
  • the feature value matching unit 22 uses an image I t acquired at the current time t and an image it ⁇ 1 acquired at the immediately-preceding time t ⁇ 1, and performs matching between these successive images for local feature values.
  • the matching can be carried out by using various publicly-know techniques used in, for example, the SIFT (Scale Invariant Feature Transformation) or the SURF (Speed Up Robustness Features).
  • SIFT Scale Invariant Feature Transformation
  • SURF Speed Up Robustness Features
  • the position-invariant feature value extraction unit 24 extracts position-invariant feature values in the image I t at the current time t by using the sets p and p′ of the corresponding feature values (step S 2 ).
  • the algorithm of this position-Invariant feature value extraction process is shown in a flowchart shown in FIG. 3 and a list shown in FIG. 4 . This algorithm is explained hereinafter with reference to the flowchart shown in FIG. 3 .
  • Step 1 Two pairs of corresponding local feature values are selected from two successive images. That is, two local feature values p 0 and p 1 are selected from the set p of the corresponding feature values in the image I t . Further, local feature values p′ 0 and p′ 1 are selected from the set p′ of the corresponding feature values in the image I t ⁇ 1 . Note that each of the feature values p 0 and p 1 and the feature values p′ 0 and p′ 1 is a pair of feature values that are determined to be matched with each other by the feature value matching unit 22 .
  • Step 3 The vectors CGV 0 and CGV 1 are compared to each other. Then, if they are not similar to each other, the process returns to the step 1.
  • the fact that the two vectors are similar to each other means that the geometrical positional relation between the local feature values p 0 and p 1 , and p′ 0 and p′ 1 are substantially unchanged between the two images. That is, it means that the positions of the feature points p 0 and p 1 can be considered to be unchanged.
  • Step 6 These vectors are compared to each other. Then, if they are similar to each other, the selected local feature values are recognized as position-invariant feature values. That is, if the difference between the two vectors is equal to or smaller than the threshold Thr, i.e., If ⁇ CGV 0 ⁇ CGV 1 ⁇ Thr, the two vectors are similar to each other. Therefore, the positions of the local feature values p 2 and p′ 2 are unchanged. Note that the fact that the two vectors are similar to each other means that the geometrical positional relations between the center of gravity CG 0 and the local feature value p 2 , and between the center of gravity CG 1 and the local feature value p′ 2 are substantially unchanged between the two images. That is, this fact means that the position of the feature point p 2 can be considered to be unchanged.
  • Step 7 The feature value p 2 extracted from the image I t is removed from the set p and stored in the variable P R .
  • the feature value p′ 2 extracted from the image I t ⁇ 1 is removed from the set p′ and stored in the variable P′ R .
  • the center of gravity between the center of gravity CG 0 and the feature value p 2 and that between the center of gravity CG 1 and the feature value p′ 2 are calculated in the respective images, and the calculated centers of gravity are used as new centers of gravity CG 0 and CG 1 .
  • Step 8 On the other hand, if ⁇ CGV 0 ⁇ CGV 1 ⁇ >Thr, it means that the positions of the feature values p 2 and p′ 2 are changed. Therefore, the feature values p 2 and p′ 2 should be excluded from the feature values to be extracted. Accordingly, the feature values p 2 and p′ 2 are removed from the sets p and p′, respectively.
  • Step 9 When the tests for all the local feature values included in the sets p and p′ have been finished, that is, when the sets p and p′ become empty sets, the process is finished.
  • the local feature values included in the variable P R at this point are position-invariant feature values. Then, the position-invariant feature value extraction process is finished.
  • the matching unit 13 obtains matching scores s m by referring to the common dictionary 12 (step S 3 ).
  • the common dictionary 12 holds models m, m+1, m+2, . . . , which are sets of feature values of respective places L m , L m+1 , L m+2 , . . . that are successively located in an environment.
  • a matching score s m between an image I t and a model m for a place L m is obtained by Expression (2).
  • s m n m ⁇ num _appear (2)
  • s m represents a matching score between the model m, which is a set of feature values of the place L m , and a set P R of the position-invariant feature values in the image I t .
  • the similarity-level calculation unit 14 obtains a second state score (first estimated value) b m by taking account of adjacent places (step S 4 ).
  • a feature that appears in the place L m also appears in adjacent places L m ⁇ 2 , L m ⁇ 1 , L m+1 and L m+2 . That is, it is predicted that the matching score of each of these adjacent places is roughly equal to or slightly lower than the matching score s m . That is, for example, when a matching score s m ⁇ 1 or s m+1 is zero even though the matching score s m is high, it means that the value of the matching score s m is incorrect i.e., the place estimation has not been correctly performed.
  • a second state score b m that is weighted by a Gaussian function p t (m, i) is obtained by the below-shown Expression (3).
  • w represents the number of adjacent places that are taken into account. For example, assuming that the frame rate is constant, when the speed is high, the value of w may be set to, for example, one, whereas when the speed is low, the value of w may be set to two.
  • the recognition ratio is further improved by normalizing this second state score b m .
  • a normalized score (second estimated value) b_norm m can be obtained by Expression (4) (step S 5 ).
  • n is a value that changes according to the moving speed of the place estimation apparatus and can be set to the maximum extraction number of position-invariant feature values obtained by the ICGM.
  • the similarity-level calculation unit 14 obtains this normalized score b_norm m .
  • the place recognition unit 15 recognizes that the image I t matches the model m, that is, the image I t is an image of a known place when this normalized score b_norm m is higher than a predetermined threshold (steps S 6 and S 7 ).
  • the feature values of the model m can be updated by adding a position-invariant feature value(s) that is not included in the original model m into the model m. Further, when the feature values of each place are stored by using indexes as in the case of Patent Literature 3, only the indexes need to be increased. That is, it is possible to minimize the increase in the necessary memory capacity. Further, by employing a first-la first-out method, for example, for the feature values of the model m, there is no need to increase the memory capacity.
  • the place recognition unit 15 recognizes the image I t as a new place (step S 8 ) and, for example, registers a place where the image I t is shot and the position-invariant feature values extracted from the image I t into the common dictionary 12 .
  • the feature value extraction unit 11 extracts feature values that are successively present in the temporal direction and remain in roughly the same positions as robust feature values. As a result, it is possible to separate feature values that move over time and hence effectively extract feature values that are effective for place recognition.
  • the center of gravity of the robust feature values is successively updated when the feature value extraction unit 11 extracts position-invariant feature values, and the robustness of other feature values are determined based on this updated center of gravity.
  • the center of gravity includes therein information of the positional relation between feature values. Therefore, by using the center of gravity, the robustness can be tested while taking the position information into account. Further, the center of gravity can be easily calculated, thus enabling high-speed processing.
  • the center of gravity used in the robustness test for feature points is the center of gravity of all the feature points that have been determined to be robust up to that moment. In other words, there is no need to refer to all the position information on an enormous number of other feature values. That is, the stability of the position of a feature point can be evaluated just by evaluating the relation with only one center of gravity, thus making it possible to compress (or reduce) the data amount and the calculation amount.
  • a method using images I t and I t ⁇ 1 at times t and t ⁇ 1, respectively, is explained as a technique for extracting position-invariant feature values (ICGM).
  • This technique is referred to as “one-way approach”.
  • a technique capable of extracting position-invariant feature values more effectively is explained. This technique is hereinafter referred to as “both-way approach”.
  • FIG. 5 shows a one-way approach ICGM in which position-invariant feature values are extracted from images I t and I t ⁇ 1 .
  • the one-way approach is an approach in which position-invariant feature values are extracted by comparing a current image with an image in the past.
  • the position-Invariant feature values extracted in this manner are much more robust than those extracted from the image I t alone (by the SIFT, the SURF, or the like).
  • the one-way approach there are cases where considerable losses of position-invariant feature values occur. Specific cases where losses could occur are explained later.
  • FIG. 6 shows a concept of a both-way approach ICGM.
  • position-invariant feature values A are extracted by comparing a current image I t with a past image I t ⁇ 1 .
  • position-invariant feature values B are extracted by comparing the current image I t with an image I t+1 in the future.
  • the inventor has found that when images are acquired by using an existing single-lens reflex camera or the like, the use of the both-way approach can extract position-invariant feature values more effectively. Specifically, the speed, the movement, and the like of the camera could affect the extractable position-invariant feature values.
  • the inventor has examined two situations that could occur when an existing single-lens reflex camera is used. The two situations are a situation where the camera rotates at a constant speed, and a situation where the camera moves toward or away from an object that is located an infinite distance away from the camera. Further, the inventor has found that the both-way approach is superior to the one-way approach in both of these two typical situations.
  • ⁇ ⁇ T Duration /T Disappear (7)
  • FIG. 7 shows this. It can be seen from FIG. 7 that the both-way approach can extract more position-invariant feature values than the one-way approach does. Specifically, if ⁇ ⁇ ⁇ 1 ⁇ 2, the both-way approach can extract all the position-invariant feature values.
  • the time T Duration is also used in the same way as the above-described case.
  • the vertical and horizontal viewing field angles are represented by ⁇ and ⁇ , respectively. It is assumed that position-invariant feature values are uniformly distributed within the viewing field.
  • ⁇ ⁇ T Duration ⁇ /d (11)
  • FIG. 8 shows a comparison between the one-way approach and the both-way approach.
  • the both-way approach can extract position-invariant feature values from a kinetic environment more effectively than the one-way approach does.
  • the one-way approach extracts only feature values that have been present in the environment from the past as position-invariant feature values.
  • the both-way approach extracts, in addition to the aforementioned feature values, feature values that are present in the environment from the current time to the future as position-Invariant feature values.
  • the both-way approach uses two information sets, i.e., a set of past and current information and a set of current and future information, the both-way approach can solve various problems present in the one-way approach.
  • the both-way approach is effective in both of the two typical situations related to the camera movements. Since general movements of a camera can be discomposed into a combination of such simple situations, it can be said that in general, the both-way approach can extract robust feature values more effectively than the one-way approach does.
  • the one-way approach ICGM and the both-way approach ICGM are advantageous for the PIRF.
  • the PIRF is also a technique that is used to extract robust feature values from successive images.
  • the threshold Thr is raised to infinity (Thr-> ⁇ )
  • feature values extracted in the one-way approach ICGM get closer to those extracted by the PIRF.
  • This experiment is an experiment for examine the accuracy of feature values extracted by the ICGM.
  • Feature values are extracted from a plurality of images by using the ICGM and the SURF, and whether matching for feature values is obtained between the plurality of images or not is compared to each other.
  • Datasets (two images shown in FIG. 9( a ) ) used for this experiment were both captured in an indoor environment (were shot indoors). Further, this environment includes therein several moving objects. In this figure, a spray bottle encircled by an ellipse has been moved between the front and back images. Further, the shooting range of the camera has also moved in the horizontal direction between the two images.
  • FIG. 9( b ) shows a state where feature points are extracted from the two images and matching between corresponding feature points is performed by the SURF.
  • corresponding feature points are connected to each other by bright lines. If the matching is correctly made, all the bright lines have to be horizontal. However, it can be seen in this figure that a lot of bright lines are inclined. That is, in this example, the matching includes a lot of errors. In addition, matching is also made for the moved object.
  • FIG. 9( c ) shows a state where position-invariant feature values are extracted from two images and matching between corresponding feature points is performed by the ICGM.
  • most of the bright lines are horizontal, indicating that the matching is correctly performed. Further, the moved object is not regarded as an object to be matched, and thus is ignored.
  • this experiment is not the SLAM. However, it is suitable to test the accuracy of ICGN place recognition.
  • a dataset used in this experiment is images that were shot at a rate of 0.5 frames per second by using a handheld camera (the resolution was resized to 480*320). When the images were taken, Shibuya train station was crowded with a lot of people. The length of the route along which shooting was performed to acquire learning data was about 80 meters, and the learning time was five minutes ( FIG. 10 ).
  • FIG. 11 shows a comparison between when the both-way approach is used and when the one-way approach is used in the case where the ICGM is used. It can be understood that the both-way approach can extract more position-invariant feature values than the one-way approach does.
  • the accuracy of the place recognition using the PIRF was 82.65 percent. Meanwhile, die accuracy of the place recognition using the ICGM was 98.56%.
  • a dataset used in this experiment is images that were shot at a rate of 0.5 frames per second by using a handheld camera (the resolution was resized to 480*320).
  • the resolution was resized to 480*320.
  • the length of the route along which shooting was performed to acquire learning data was about 170 meters, and the learning time was 9.5 minutes.
  • FIG. 13 shows an experiment result.
  • the solid lines indicate the route along which places were learned.
  • the dots indicate coordinates at which places were successfully recognized, it can be seen that places that were learned in the first lap along the route were correctly recognized in the second lap.
  • FIG. 14 shows the accuracy of this experiment.
  • the accuracy of the place recognition using the ICGM is better than those of the PIRF-nav2.0 (technique disclosed in Patent Literature 3 and Non-patent Literature 1) and the publicly-known FAB-MAP.
  • the Proposed method real-time
  • the Proposed method non-real-time
  • the both-way approach was used in the place estimation phase.
  • the number of extracted feature values in the Proposed method is larger than that in the Proposed method (real-time), indicating that the accuracy of the Proposed method (non-real-time) is improved in the Proposed method (non-real-time).
  • the FAB-MAP is the fastest because this technique is a hatch processing technique.
  • the both-way approach when used to extract feature values of an image I t , an image I t+1 is also necessary. In other words, information (image) of a future event is necessary. In other words, since feature values of the image I t need to be extracted after the image I t+1 is acquired, the feature values cannot be extracted at the time t in real time. That is, some time lag is required. Therefore, in a real-time system such as a robot, the both-way approach cannot be used in its place recognition phase that requires a real-time characteristic. In such cases, it is necessary to use the one-way approach. However, even in a real-time system, the both-way approach can be used in the dictionary creating phase that does not require a real-time characteristic. Further, for example, for the use for a pedestrian navigation system and the like, a strict-sense real-time characteristic is not substantially required. Therefore, it is possible to improve its system performance by using the both-way approach for both the dictionary creating phase and the place recognition phase.
  • a place can be identified from an image and a dictionary can be updated on-line according to the present invention. Therefore, for example, when the present Invention is combined with a moving picture shooting function of a portable device, the following applications can be provided.
  • the person When a person gets lost in a department store, a shopping mall, or the like, the person shoots a scene around him/her by swinging the portable device around him/her and sends the shot image to a server.
  • the server analyzes the image, and thereby can reply where the person is located, or additionally what kinds of facilities and shops are present around the person.
  • GPSs cannot be used indoors, in contrast to them, in this exemplary embodiment a search moving picture sent from a user can also be used as data for updating a dictionary and a map. Therefore, the dictionary and the map can always be updated. Note that in principle, the map data of conventional car navigation systems cannot be updated, or the updating is considerably and requires considerable costs.
  • each base station may possess and update a map of the range which that base station is in charge of. That is, there is no need to prepare an enormous dictionary, thus making it possible to considerably save the memory and the calculation speed.
  • wearable visions such as glasses appears in the feature. Such glasses will be able to always identify the place and provide useful information.
  • a technique for extracting position-invariant feature values that are robust even to distortions of images, rotation, shearing, translation, scaling, and so on, and thereby carrying out place estimation with higher accuracy.
  • OpenCV is known as a technique for correcting an image distortion.
  • a distortion can be corrected by acquiring internal parameters (f x , f y , c x , c y ), coefficients (k 1 , k 2 ) indicating a radial distortion, coefficients (p 1 , p 2 ) indicating a tangential distortion by calibrating the camera, and using the acquired internal parameters and the distortion coefficients.
  • the aforementioned internal parameters and the distortion coefficients are intrinsic values of the camera.
  • the local feature value extraction unit 21 preferably performs the above-described distortion correction process before extracting feature values from the images.
  • the corresponding feature value selection unit 23 and the position-invariant feature value extraction unit 24 can extract corresponding feature values and position-invariant feature values with higher accuracy.
  • each of “a and a′”, “b and b′”, “c and c′”, and “e and e′” are a correctly-matched pair of feature values.
  • “d and d′” are an incorrectly-matched pair of feature values.
  • the corresponding feature value selection unit 23 To determine whether given points i and i′ are incorrectly matched feature points or not the corresponding feature value selection unit 23 first obtains relative distance vectors D i and D i ′ of the points i.
  • the corresponding feature value selection unit 23 obtains an index “offset” by using the vectors D i and D i ′ for the points i and i′.
  • FIG. 16 shows a method for calculating “offset”.
  • the corresponding feature value selection unit 23 obtains an index “diff(D i , D i ′)” by using the “offset”.
  • the diff(D i , D i ′) is defined by Expression (12).
  • the diff(D i , D i ′) is not an affine-invariant quantity and is not sensitive to the noise ratio. Therefore, diff normal that is obtained by normalizing the diff(D i , D i ′) is examined.
  • the diff normal can be calculated by Expression (13) by using an average ⁇ diff and a standard deviation ⁇ diff .
  • the corresponding feature value selection unit 23 calculates diff normal for a given pair of feature values i and i′. Then, when diff normal >T OC , the corresponding feature value selection unit 23 determines that the pair of feature values i and i′ should be eliminated from the set of corresponding feature values, i.e., determines that they are incorrectly matched. Note that T OC is an arbitrary threshold.
  • d and d′ may be excluded based on an appropriately-defined threshold T OC .
  • T OC an appropriately-defined threshold
  • FIG. 17 shows an example of an affine transformation.
  • rotation and contraction are performed between two images.
  • W(a, b, c, d) and W′(a′, b′, c′, d′) are sets of corresponding feature values for the two images.
  • the symbol “o” represents the center of gravity of at least one point included in W
  • “o′” represents the center of gravity of at least one point included in W′.
  • the coordinates of the feature points a, b, c and d included in the set W are significantly different from those of the feature points a′, b′, c′ and d′ included in the set W′. Meanwhile, the proportionality among the areas S (or sizes S) of a plurality of figures that are formed by using these feature points as their vertices is not changed.
  • the position-invariant feature value extraction unit 24 calculates the center of gravity o of the feature points included in the set W by Expression (15),
  • the position-invariant feature value extraction unit 24 calculates a deviation of an area ratio for size ratio) of figures that are formed by using a given feature point i by Expression (16). Note that o represents the center of gravity and j represents an arbitrary feature point other than the feature point i,
  • An algorithm 2 shown in FIG. 18 is for a process for excluding feature points that are not affine-transformed from sets of corresponding feature values.
  • the position-invariant feature value extraction unit 24 can extract position-invariant feature values. Further, the reliability of the centers of gravity o and o′ gradually improves through the calculation.
  • the process related to this algorithm 2 is as follows.
  • each of the sets W and W′ is preferably a set of feature values extracted by the above-described order restriction.
  • the sets W and W′ are used as initial values of sets W tmp and W′ tmp of feature values. Centers of gravity o and o′ and total areas S ⁇ and S′ ⁇ are calculated for these W tmp and W′ tmp , respectively, by Expressions (14) and (15). Further, the size of the set W tmp , i.e., the number of feature values included in the set W tmp is stored as “SizePrevious”.
  • AveDev ai i [Expression 12] (hereinafter expressed as “AveDev”) is calculated by Expression (17).
  • AveDev>T AC it is recognized that the pair of corresponding feature values i and i′ are not affine-transformed. Therefore, the feature values i and i′ are removed from the sets W tmp and W′ tmp , respectively,
  • AveDev determination process is performed for every corresponding feature value included in the sets W tmp and W′ tmp .
  • the size of the set W tmp is compared with the SizePrevious stored in the Step 1.
  • the process is finished.
  • both sizes are different from each other, the removal of the corresponding feature values is still in progress. Therefore, the process returns to the Step 1 and continues from there.
  • An algorithm 3 shown in FIG. 19 is for a process for correcting the calculation result of the algorithm 2.
  • the position-invariant feature value extraction unit 24 re-inspects, according to the algorithm 3, the feature values excluded by the algorithm 2 by using the centers of gravity o and o′ at the time when the process according to the algorithm 2 has been finished. In this way, it is possible to relieve (or revive) all the feature values that should be regarded as position-invariant feature values but have been mistakenly excluded by the algorithm 2 at the earlier stage of the calculation according to the algorithm 2, i.e., at the time when the reliability of the centers of gravity o and o′ was still low.
  • the process related to this algorithm 3 is as follows.
  • sets W and W′ of corresponding feature values, and sets W tmp and W′ tmp are input.
  • the sets W and W′ are the same sets of feature values as the sets W and W′ that were input in the algorithm 2.
  • the sets W tmp and W′ tmp are the output of the algorithm 2.
  • AveDev is calculated by Expression (17).
  • AveDev it is determined that the pair of corresponding feature values i and i′ has been affine-transformed by using the reliable centers of gravity o and o′. Therefore, the feature values i and i′ are included into the sets W tmp and W′ tmp .
  • the above-described determination process is performed for every corresponding feature value included in the sets W and W′.
  • the size of the set W tmp is compared with the SizePrevious stored in the Step 1.
  • the process is finished.
  • the contents of the sets W tmp and W′ tmp are output as sets W AC and W′ AC .
  • the relief (or revival) of the position-invariant feature values is still in progress. Therefore, the process returns to the Step 1 and continues from there.
  • a set W AC of accurate position-invariant feature values is obtained through the above-described series of processes.
  • the use of this set W AC makes it possible to perform place estimation with higher accuracy.
  • N pair represents the size of a set of corresponding feature values obtained from images of two places by the corresponding feature value selection unit 23 . It is assumed that this set of corresponding feature values is a set for which the above-described correction according to the distance restriction has not performed yet. That is, S Affine indicates the matching level between feature values for which the series of processes according to the distance restriction and the area restriction has not performed yet and those for which the series of processes has been already performed. Note that S Affine is no less than zero and no greater than one (0 ⁇ S Affine ⁇ 1).
  • S Dispersion is an index for evaluating the similarity level for two images including affine-invariant feature values more precisely.
  • S Dispersion has such an effect that the larger the difference between the average distance between all the feature points included in a set of corresponding feature values and their center of gravity o in one of two images and the average distance in the other image becomes, the smaller the similarity level becomes.
  • S Dispersion is greater than zero and less than one (0 ⁇ S Dispersion ⁇ 1).
  • N zt and N zc represent the total numbers of local feature values acquired in places z t and z c , respectively.
  • This technique makes it possible to perform similarity-level calculation with higher accuracy because the similarity level is calculated by using position-invariant feature values containing fewer (or smaller) noises extracted by using geometrical restrictions such as the distance restriction and the area restriction.
  • feature values are two-dimensional.
  • feature values may have three dimensions or more, provided that the feature values can be acquired from an environment.
  • Kinect registered trademark
  • depth information in addition to the two-dimensional image information can be acquired, thus enabling the extraction of three-dimensional feature values.
  • position-invariant feature values can be extracted by the algorithms shown in Figs. X and Y irrespective of the number of dimensions of the feature points. That is, if topology can be defined for feature points, these algorithms can be applied. For example, they can be applied to a similarity level determination of a gene arrangement or the like.
  • the present invention when the present invention is combined with a visual-odometry technique in which a locus of a camera movement is detected from camera images, navigation that uses only camera images can be provided.
  • a current position (place) can be estimated only from camera images without using an existing current position (place) detection technique such as the GPS.
  • the present invention can be applied to navigation in a robot or a smartphone equipped with a camera that moves or is located indoors or in a place where a GPS signal cannot be substantially received.
  • the present invention is described as a hardware configuration in the above-described exemplary embodiments, the present invention is not limited to the hardware configurations. That is, arbitrary processes can also be implemented by causing a CPU (Central Processing Unit) to execute a computer program. In such cases, the computer program can be stored in various types of non-transitory computer readable media and thereby supplied to computers.
  • the non-transitory computer readable media includes various types of tangible storage media.
  • non-transitory computer readable media examples include a magnetic recording medium (such as a flexible disk, a magnetic tape, and a hard disk drive), a magneto-optic recording medium (such as a magneto-optic disk), a CD-ROM (Read Only Memory), a CD-R, and a CD-R/W, and a semiconductor memory (such as a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (Random Access Memory)).
  • the program can be supplied to computers by using various types of transitory computer readable media. Examples of the transitory computer readable media include an electrical signal, an optical signal, and an electromagnetic wave.
  • the transitory computer readable media can be used to supply programs to computer through a wire communication path such as an electrical wire and an optical fiber, or wireless communication path.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
US14/440,768 2012-11-06 2013-11-06 Feature value extraction apparatus and place estimation apparatus Expired - Fee Related US9396396B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2012-244540 2012-11-06
JP2012244540 2012-11-06
PCT/JP2013/006550 WO2014073204A1 (ja) 2012-11-06 2013-11-06 特徴量抽出装置及び場所推定装置

Publications (2)

Publication Number Publication Date
US20150294157A1 US20150294157A1 (en) 2015-10-15
US9396396B2 true US9396396B2 (en) 2016-07-19

Family

ID=50684329

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/440,768 Expired - Fee Related US9396396B2 (en) 2012-11-06 2013-11-06 Feature value extraction apparatus and place estimation apparatus

Country Status (4)

Country Link
US (1) US9396396B2 (ja)
EP (1) EP2922022B1 (ja)
JP (1) JP6265499B2 (ja)
WO (1) WO2014073204A1 (ja)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898677B1 (en) 2015-10-13 2018-02-20 MotionDSP, Inc. Object-level grouping and identification for tracking objects in a video

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102465332B1 (ko) * 2015-12-29 2022-11-11 에스케이플래닛 주식회사 사용자 장치, 그의 제어 방법 및 컴퓨터 프로그램이 기록된 기록매체
US10366305B2 (en) 2016-02-24 2019-07-30 Soinn Inc. Feature value extraction method and feature value extraction apparatus
CN107515006A (zh) * 2016-06-15 2017-12-26 华为终端(东莞)有限公司 一种地图更新方法和车载终端
JP7046506B2 (ja) * 2017-06-12 2022-04-04 キヤノン株式会社 情報処理装置、情報処理方法及びプログラム
CN108256463B (zh) * 2018-01-10 2022-01-04 南开大学 基于esn神经网络的移动机器人场景识别方法
CN110471407B (zh) * 2019-07-02 2022-09-06 无锡真源科技有限公司 一种模组自动调节的自适应定位系统及方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010055063A1 (en) * 2000-05-26 2001-12-27 Honda Giken Kogyo Kabushiki Kaisha Position detection apparatus, position detection method and position detection program
US20100114374A1 (en) * 2008-11-03 2010-05-06 Samsung Electronics Co., Ltd. Apparatus and method for extracting feature information of object and apparatus and method for creating feature map
JP2010238008A (ja) 2009-03-31 2010-10-21 Fujitsu Ltd 映像特徴抽出装置、及びプログラム
JP2011053823A (ja) 2009-08-31 2011-03-17 Tokyo Institute Of Technology 特徴量抽出装置及び方法、並びに位置推定装置及び方法
JP2011215716A (ja) 2010-03-31 2011-10-27 Toyota Motor Corp 位置推定装置、位置推定方法及びプログラム
WO2011145239A1 (ja) 2010-05-19 2011-11-24 国立大学法人東京工業大学 位置推定装置、位置推定方法及びプログラム
US20150161147A1 (en) * 2010-04-29 2015-06-11 Google Inc. Associating still images and videos

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010033447A (ja) * 2008-07-30 2010-02-12 Toshiba Corp 画像処理装置および画像処理方法
JP2010115307A (ja) 2008-11-12 2010-05-27 Sozosha:Kk モイスチャ処理装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010055063A1 (en) * 2000-05-26 2001-12-27 Honda Giken Kogyo Kabushiki Kaisha Position detection apparatus, position detection method and position detection program
US20100114374A1 (en) * 2008-11-03 2010-05-06 Samsung Electronics Co., Ltd. Apparatus and method for extracting feature information of object and apparatus and method for creating feature map
JP2010238008A (ja) 2009-03-31 2010-10-21 Fujitsu Ltd 映像特徴抽出装置、及びプログラム
JP2011053823A (ja) 2009-08-31 2011-03-17 Tokyo Institute Of Technology 特徴量抽出装置及び方法、並びに位置推定装置及び方法
JP2011215716A (ja) 2010-03-31 2011-10-27 Toyota Motor Corp 位置推定装置、位置推定方法及びプログラム
US20150161147A1 (en) * 2010-04-29 2015-06-11 Google Inc. Associating still images and videos
WO2011145239A1 (ja) 2010-05-19 2011-11-24 国立大学法人東京工業大学 位置推定装置、位置推定方法及びプログラム
US20130108172A1 (en) 2010-05-19 2013-05-02 Tokyo Institute Of Technology Position Estimation Device, Position Estimation Method, And Program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Tongprasit, N., et al.: "PIRF-Nav 2: Speeded-Up Online and Incremental Appearance-Based SLAM in an Indoor Environment," IEEE Workshop on Applications of Computer Vision (WACV), 2011.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898677B1 (en) 2015-10-13 2018-02-20 MotionDSP, Inc. Object-level grouping and identification for tracking objects in a video

Also Published As

Publication number Publication date
JPWO2014073204A1 (ja) 2016-09-08
US20150294157A1 (en) 2015-10-15
EP2922022A4 (en) 2016-10-12
JP6265499B2 (ja) 2018-01-24
WO2014073204A1 (ja) 2014-05-15
EP2922022B1 (en) 2020-01-01
EP2922022A1 (en) 2015-09-23

Similar Documents

Publication Publication Date Title
US9396396B2 (en) Feature value extraction apparatus and place estimation apparatus
CN108960211B (zh) 一种多目标人体姿态检测方法以及系统
CN108717531B (zh) 基于Faster R-CNN的人体姿态估计方法
Portmann et al. People detection and tracking from aerial thermal views
Nedevschi et al. Stereo-based pedestrian detection for collision-avoidance applications
US8755630B2 (en) Object pose recognition apparatus and object pose recognition method using the same
CN107862300A (zh) 一种基于卷积神经网络的监控场景下行人属性识别方法
US10366305B2 (en) Feature value extraction method and feature value extraction apparatus
US20140185924A1 (en) Face Alignment by Explicit Shape Regression
CN110363047A (zh) 人脸识别的方法、装置、电子设备和存储介质
US9098744B2 (en) Position estimation device, position estimation method, and program
CN111623765B (zh) 基于多模态数据的室内定位方法及系统
US9158963B2 (en) Fitting contours to features
US9202138B2 (en) Adjusting a contour by a shape model
JP5557189B2 (ja) 位置推定装置、位置推定方法及びプログラム
CN112200056A (zh) 人脸活体检测方法、装置、电子设备及存储介质
CN113177892A (zh) 生成图像修复模型的方法、设备、介质及程序产品
CN113610967B (zh) 三维点检测的方法、装置、电子设备及存储介质
CN111339973A (zh) 一种对象识别方法、装置、设备及存储介质
CN110956098B (zh) 图像处理方法及相关设备
Garcia et al. Automatic detection of heads in colored images
Wu et al. Real-time robust algorithm for circle object detection
JP2002109539A (ja) 画像撮影手段の位置姿勢検出装置、3次元物体認識装置および地図特徴データ作成方法
CN112927291B (zh) 三维物体的位姿确定方法、装置及电子设备和存储介质
Sadeghi et al. Ocrapose ii: An ocr-based indoor positioning system using mobile phone images

Legal Events

Date Code Title Description
AS Assignment

Owner name: TOKYO INSTITUTE OF TECHNOLOGY, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HASEGAWA, OSAMU;HUA, GANGCHEN;REEL/FRAME:035568/0077

Effective date: 20150424

AS Assignment

Owner name: SOINN HOLDINGS LLC, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TOKYO INSTITUTE OF TECHNOLOGY;REEL/FRAME:038895/0489

Effective date: 20160608

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: SOINN INC., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SOINN HOLDINGS LLC;REEL/FRAME:050090/0053

Effective date: 20190730

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20240719