KR101741761B1

KR101741761B1 - A classification method of feature points required for multi-frame based building recognition

Info

Publication number: KR101741761B1
Application number: KR1020150172509A
Authority: KR
Inventors: 박시영; 유지상
Original assignee: 광운대학교 산학협력단
Priority date: 2015-12-04
Filing date: 2015-12-04
Publication date: 2017-05-30

Abstract

A method for classifying feature points for multi-frame based building recognition, which classifies feature points required for building recognition using one or more multi-frames, comprising the steps of: (a) extracting feature points from each image of the multi-frame image; (b) performing matching between each of the extracted minutiae and acquiring a pair of matched minutiae; (c) obtaining a homography matrix from a plurality of pairs of feature points, and classifying the feature points using the obtained homography matrix; And (d) repeating the step (c) using the minutiae remaining.
By using the feature point classification method as described above, it is possible to significantly improve the accuracy of the feature point matching by using the point where the feature points of the background appear only in the limited image by searching for the association of the feature points using the RANSAC and finding and classifying the feature points repeatedly.

Description

[0001] The present invention relates to a multi-frame based building recognition method,

The present invention relates to a feature point classification method for multi-frame based building recognition, which can classify feature points required for building recognition using one or more multi-frames and improve the performance of existing general building recognition methods in recognition and registration .

Particularly, the present invention extracts feature points primarily through SIFT (scale invariant feature transform), removes the misaligned feature points, applies RANSAC (random sample consensus) for feature point classification in the masked region, The present invention relates to a method of classifying feature points for multi-frame based building recognition.

In general, object recognition is a technology that can be applied to various vision applications. As an example, it is possible to reduce the incidence of accidents by recognizing people by inputting the surrounding images from the vehicle. Especially, as the utilization of the mobile phone has widened in recent years, various programs have been developed to photograph nearby objects with the built-in camera and to distinguish objects in the acquired images. Among them, the purpose is to provide services such as recognizing the building in the acquired image and automatically searching the recognized building on the web to provide various information of the store in the building to the user. In particular, .

The type of image used as input for building recognition is largely divided into an aerial image and a city building image (Non-Patent Document 1). The city building image is an image obtained by using a camera built in a smart phone and is mostly coexisted with surrounding objects such as a person, a tree, and the sky. In most cases, except for a building having a unique form such as a landmark, Are composed of similar forms. Therefore, in order to recognize the building, it is necessary to classify the specific building and the surrounding environment, and a method for quickly processing a large amount of data is required. In addition, research is proceeding toward increasing the accuracy of information using information such as GPS without using only image information [Non-Patent Document 1].

The building recognition system can be divided into three steps: feature point extraction, feature point matching, and feature point classification. The feature point extraction method includes a method of using a global feature such as a scale invariant feature transform (SIFT) [Non-Patent Document 2], a method of extracting a feature based on a geometric feature of a building or a color, texture, based image retrieval method [Non-Patent Document 3].

When matching using global feature points, 128 or 64 dimensional descriptors are generally used. When high dimensional information is used, processing time is long. In order to solve this problem, a principal component analysis (PCA) [Non-Patent Document 4], linear discriminant analysis (LDA) [Non-Patent Document 5], linear preserving projections [Non-Patent Document 6], SLPP (supervised LPP) Non-patent document 7], and semi-supervised discriminant analysis (SDA [non-patent document 8]).

In addition, in the case of feature-based matching, there is a disadvantage that many feature points are not extracted. Therefore, a matching method using content-based searching is under way. Content-based retrieval is a method of finding similar images by using values such as color, texture, and shape of images instead of text in a document-based text-based data retrieval method [Non-Patent Document 9]. In the case of a building, since there are many linear shapes, feature points of a building are extracted by using the listed form of the window and the structure of the straight line [Non-Patent Document 10]. The feature point matching finds the correlation and finds the correlation with the difference (euclidean or mahalanobis distance) between the feature points extracted from the reference image and the query image.

Finally, as a method of classifying buildings based on feature points, the feature points extracted from all buildings are collected and a learning process is performed through a support vector machine (SVM [non-patent document 11], a hypothesis function is set, A way to recognize buildings is common. The building recognition method described above is mainly for recognition of a single building, and [Non-Patent Document 12] introduces a method for recognizing multiple buildings.

Most building recognition methods extract feature points from a single image and convert these feature points into high dimensional information such as a disk rupture. In particular, in the case of SIFT [Non-Patent Document 2], extracted feature points include not only building information but also non-important background information, the accuracy of feature point registration is significantly reduced.

Especially when the feature points are classified by applying SIFT (Non-Patent Document 2) with images (multi-frames) obtained at different viewpoints with respect to buildings to be recognized. In the conventional method, a single image is used as an input, and the input image includes the entire building form at one view. On the other hand, since the multi-frame method uses multiple view images obtained from different angles, The view of the building viewed from the angle is included in each input image.

Therefore, in the case of the multi-frame method, it is necessary to classify the feature points and improve the accuracy of the feature point matching process.

[Non-Patent Document 1] J. Li, W. Huang, L. Shao, and N. Allinson, "Building recognition in urban environments: A survey of state-of-the-art and future challenges", Information Sciences, vol. 277 , no. 1, pp. 406-420, Sep. 2014 [Non-Patent Document 2] D. Lowe, "Distinctive image features from scale-invariant keypoints", International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. [Non-Patent Document 3] Y. Li and L. G. Shapiro, "Consistent line clusters for building recognition in CBIR ", Pattern Recognition, 2002. Proceedings. 16th International Conference, vol. 3, pp. 952-956, Aug. 2002. [Non-Patent Document 4] I. T. Jolliffe, Principal component analysis, 2nd ed., Springer, 2002. [Non-Patent Document 5] G. J. Malachlan, Discriminant analysis and statistical pattern recognition, Wiley-interscience, New York, 1992. [Non-Patent Document 6] X. He, and P. Niyogi, "Locality Preserving Projection", Proc. Conf. Advances in Neural information processing System, 2003. [Non-Patent Document 7] D. Cai, X. He and J. Han, "Using Graph Model for Face Analysis", Department of Computer Science, University of Illinois at Urbana Champaign, 2005. [Non-Patent Document 8] D. Cai, X. he and J. Han, "Semi-supervised discriminant analysis ", Computer vision, Proc. IEEE 11th Int. Conf. Computer Vision pp. 1-7, Oct. 2007. [Non-Patent Document 9] J. H. Heo and M. C. Lee, "Building recognition using image segmentation and color features", Journal of Korea Robotics Society, vol. 8, no. 2, pp. 82-91, June. 2013. [Non-Patent Document 10] W. Zahng and J. Kosecka, "Localization based on building recognition", IEEE Computer Society Conference, June 2005. [Non-Patent Document 11] V. Vapnik, The nature of statistical learning theory, Springer, 1995. [Non-Patent Document 12] H. Trinh, D. N. Kim and K. H. Jo, "Facet-based multiple building analysis for robot intelligence", Mathematics and Computation, vol. 205, no. 2, pp. 537-549, Nov. 2008. [Non-Patent Document 13] M. A. Fischler and R. C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography", Communications of the ACM, vol. 24, no. 6, pp. 381-395, June 1981. [Non-Patent Document 14] H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, "Speeded-up robust feature", Computer Vision and Image Understanding, vol. 10, no. 3, pp. 346-359, June 2008. [Non-Patent Document 15] S. M. Smith and J. M. Brady, "Susan ?? a new approach to low level image processing", International Journal of Computer Vision, vol. 23, no. 1, pp. 45-78, May 1997. [Non-Patent Document 16] E. Rosten and T. Drummond, "Machine learning for high-speed corner detection ", European Conference on Computer Vision, Graz, Austria, pp. 430-443, May 2006. [Non-Patent Document 17] L. M. J. Florack, B. M. Haar Romeny, J. J. Koenderink and M. A. Viergever, "Generalized intensity transformations and differential invariants", Journal of Mathematical Imaging and Vision, vol. 4, no. 2, pp. 171-187, May 1994. [Non-Patent Document 18] E. Dubrofsky, Homography estimation, UNIVERSITY OF BRITISH COLUMBIA, March 2009. [Non-Patent Document 19] M. M. Hossain, H. J. Lee and J. S. Lee, "Fast image stitching for video stabilization using sift feature points", The Korean Institute of Communications and Information Sciences, vol. 39, no. 10, pp. 957-966, Oct. 2014. [Non-Patent Document 20] B. W. Chung, K. Y. Park and S. Y. Hwang, "A Fast and Efficient Haar-Like Feature Selection Algorithm for Object Detection", The Korean Institute of Communications and Information Sciences, vol. 38, no. 6, pp. 486-497, June 2013. [Non-Patent Document 21] J. H. Hong, B. C. Ko and J. Y. Nam, "Human Action Recognition in Still Image Using Weighted Bag-Of-Features and Ensemble Decision Trees", The Korean Institute of Communications and Information Sciences, vol. 38, no. 1, pp. 1-9, Jan 2013.

SUMMARY OF THE INVENTION The object of the present invention is to solve the above-mentioned problems, and it is an object of the present invention to provide a method and apparatus for extracting feature points primarily through SIFT (scale invariant feature transform) and a method for classifying feature points for multi-frame based building recognition, in which multiple descriptors existing in one feature point are integrated by applying random sample consensus. That is, since the classified minutiae are obtained through the matching method, one minutiae point includes a plurality of descriptors and thus a process of integrating them.

Particularly, the object of the present invention is to provide a method and apparatus for searching a multi-frame image by using RANSAC to find a relation between minutiae points repeatedly, since input images are different in viewpoint, And a feature point classification method for multi-frame based building recognition, which finds and classifies emerging feature points.

According to an aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, the method comprising: (a) extracting feature points from each image of the multi-frame image; (b) performing matching between each of the extracted minutiae and acquiring a pair of matched minutiae; (c) obtaining a homography matrix from a plurality of pairs of feature points, and classifying the feature points using the obtained homography matrix; And (d) repeating the step (c) by using the remaining minutiae that have been classified.

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, wherein feature points are extracted using difference in gaussian (DOG) of a scale invariant feature transform (SIFT) do.

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, the method comprising the steps of: obtaining a slope with respect to a pixel of the minutiae in the step (b), defining a descriptor with slope information, And feature points having the smallest sum of euclidean distance differences are matched with each other.

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, the method comprising the steps of: (c) transforming one feature point (hereinafter referred to as a first feature point) into at least two matched feature points The Euclidean distance difference between the converted minutiae (first converted minutiae) and another minutiae (hereinafter referred to as the second minutiae) is determined. If the obtained distance difference is smaller than the predetermined threshold value, classification into matched minutiae .

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, the method comprising: obtaining a first homography matrix using a pair of feature points in the step (c) A second order homography matrix is obtained by a pair of first feature points, and a second feature point is classified for all feature points by using a second homography matrix.

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, comprising the steps of: (c) selecting four pairs of randomly matched feature points to obtain a homography matrix H; H, and obtains a homography matrix H in which the summed value is minimized. Then, the homography matrix H which becomes the minimum is selected as the first homography matrix H .

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, the feature point classification method comprising the steps of: setting a region of feature points classified in a second order in the step (c) ).

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, wherein in the step (d), classification steps are performed on remaining feature points except for the feature points classified in step (c) .

According to another aspect of the present invention, there is provided a method of classifying feature points for multi-frame based building recognition, wherein the repeating process is stopped when the number of remaining feature points falls below a predetermined minimum number in step (d).

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, the method further comprising: (e) generating descriptors of each image as one descriptor for the obtained feature points .

According to another aspect of the present invention, there is provided a feature point classification method for multi-frame based building recognition, wherein in step (e), the kth descriptor x ⁿ ⁺ ¹ _dk of the ⁿ ⁺ 1th reference image feature point satisfies the following formula , It is integrated with the value of the following [Equation 2].

[Equation 1]

[Equation 2]

However, x ⁿ _dk is n and an integrated value of the k-th descriptor of the second reference image feature points, d means a descriptor (descriptor), and in the d _k k is refers to the dimension index of the descriptor and, n or n + 1 are multi- T ⁿ _dk is a threshold value of the kth descriptor of the nth reference image feature point.

The present invention is not in the feature point classification method for a multi-frame-based building recognition, in the step (e), n + 1, k-th descriptor x ⁿ ⁺ ^1, _dk of the second reference image feature points satisfies the following [Equation 1] If both the current integrated value x ⁿ _dk of the descriptor and the integrated value of the equation 2 are both stored, both of the two descriptor information are stored and the two pieces of information stored when the k reference descriptors of the next reference image minutiae are integrated are stored And stores information having little difference among the three types of information, that is, the two stored information and the current descriptor information.

The present invention also relates to a computer-readable recording medium on which a program for performing a feature point classification method for multi-frame based building recognition is recorded.

As described above, according to the feature point classification method for multi-frame-based building recognition according to the present invention, by finding the association between feature points using RANSAC and locating and repeatedly appearing feature points, It is possible to obtain an effect of significantly increasing the accuracy of the feature point matching.

That is, the feature points extracted from backgrounds such as occlusion regions in trees and people, and backgrounds such as sky and mountains are classified as meaningless feature points, which degrades the performance of matching and recognition methods. Therefore, it is possible to classify feature points necessary for building recognition by using one or more multiframes to classify the feature points necessary for building recognition, thereby improving the performance of the building recognition method in the recognition and matching step.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram showing a configuration of an overall system for carrying out the present invention; Fig.
2 is a flowchart illustrating a feature point classification method for multi-frame based building recognition according to an embodiment of the present invention.
3 is a detailed flowchart illustrating a feature point classification method for multi-frame based building recognition according to an exemplary embodiment of the present invention.
FIG. 4 illustrates an example of a descriptor block form of a SIFT according to an embodiment of the present invention; FIG.
5 is a graph illustrating the histogram transformation of the tilt direction and magnitude according to an embodiment of the present invention.
6 is a graph illustrating an example of the result of feature point matching using the SIFT method according to an embodiment of the present invention.
FIG. 7 is a flowchart illustrating a step of classifying minutiae points using homography according to an embodiment of the present invention. FIG.
FIG. 8 is a diagram illustrating an example of a result of a feature point classification using a homography matrix according to an embodiment of the present invention.
FIG. 9 is a graph showing the rotation of two planes about the z-axis in accordance with an embodiment of the present invention, and is a graph showing (a) rotation after (b) rotation.
FIG. 10 is a table showing the amount of change of the x-coordinate value according to the rotation according to an embodiment of the present invention.
11 is an exemplary image of an area setting including accurately matched minutiae according to an embodiment of the present invention.
FIG. 12 is an exemplary image of an area setting including minutiae matched by a homography matrix according to an embodiment of the present invention, wherein (a) is an example image showing a reset area of an initially set area (b);
FIG. 13 is an exemplary image showing erroneously matched minutiae removal according to an embodiment of the present invention. FIG.
FIG. 14 is an exemplary image showing repeatedly reclassified minutiae according to an embodiment of the present invention. FIG.
FIG. 15 is an exemplary image showing (a) a pair of full feature point matching pairs (b) according to an embodiment of the present invention, and a matching pair accurately matched through the proposed method;
16 is a reference image of the Myeongdong A building according to the experiment of the present invention.
FIG. 17 is a view of a query image of Myongdong A building according to the experiment of the present invention.
18 is an image showing the classified minutiae according to the experiment of the present invention.
FIG. 19 is a table showing recall in% of the A building feature point according to the experiment of the present invention. FIG.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the drawings.

In the description of the present invention, the same parts are denoted by the same reference numerals, and repetitive description thereof will be omitted.

First, examples of the configuration of the entire system for carrying out the present invention will be described with reference to Fig.

1, a feature point classification method for multi-frame based building recognition according to the present invention is a feature point classification method for recognizing a multi-frame based on a multi-frame image (or an image) 10, And may be implemented as a program system on the terminal 20. That is, the minutiae classification method may be implemented by a program and installed in the computer terminal 20 and executed. A program installed in the computer terminal 20 can operate as a single program system 30. [

Meanwhile, as another embodiment, the feature point classification method for multi-frame based building recognition may be implemented by a single electronic circuit such as an ASIC (application-specific semiconductor) in addition to being operated by a general-purpose computer. Or a dedicated computer terminal 20 dedicated to only classifying feature points in an image. This will be referred to as a feature point classifier 40. Other possible forms may also be practiced.

On the other hand, the multi-frame image 10 means an image acquired at a predetermined interval with respect to one building. That is, the images are acquired at regular intervals. This means that there are many or N reference images for one building.

In addition, the image 10 is composed of a frame (image) continuous in time. One frame has one image. Also, the image 10 may have one frame (or image). That is, the image 10 corresponds to one image.

Next, a feature point classification method for multi-frame based building recognition according to an embodiment of the present invention will be described with reference to FIG. FIG. 2 shows a general flowchart of a feature point classification method for multi-frame based building recognition according to the present invention. A detailed flowchart of the feature point classification method using the multi-frame is shown in FIG.

As shown in FIG. 2 or FIG. 3, the feature point classification method for multi-frame based building recognition according to the present invention includes extracting feature points (S10), matching feature points extracted using a plurality of frames, (S30) of obtaining a homography matrix from a plurality of pairs of feature points and classifying the feature points necessary for the building (S30), and classifying the homography using the remaining pair of feature points And repeating step S40. In addition, the method further includes a step S50 of organizing and extracting descriptors for the obtained minutiae.

In particular, the minutiae classification method consists of three steps. In the first step, feature points are extracted by applying a scale invariant feature transform (SIFT) [Non-Patent Document 2] method. In the second step, feature points extracted using multiple frames are matched. In the final step, a plurality of feature point pairs obtained by feature point matching are obtained by using a random sample consensus (RANSAC) [non-patent document 13] method, and classified into feature points necessary for building using the homography matrix do.

Hereinafter, each step will be described in more detail.

First, minutiae points are extracted from the input image (S10).

The method of extracting feature points for a single image is based on a difference of gaussian (DOG) of a scale invariant feature transform (SIFT) [Non-Patent Document 2], a Haar wavelet of a speeded up robust feature (SURF) a smallest uni-value segment assimilating nucleus test [Non-Patent Document 15] and FAST (Features from Accelerated Segment Test) [Non-Patent Document 16].

Among them, the DOG of SIFT [Non-Patent Document 17] utilizes the brightness difference of the surroundings and has an advantage of being robust against image size, rotation change and noise, and high in repeatability, have. Therefore, in the present invention, feature point extraction and matching are performed using the SIFT method.

First, a difference of Gaussian (DOG) method is applied to the entire region of the image to extract feature points (Non-Patent Document 2). That is, feature points are extracted using the DOG of SIFT in the image, and the matching between the extracted feature points is performed.

Next, matching is performed between the extracted minutiae (S20). That is, the feature points are matched between the two images, and the matched feature point pairs are acquired.

In order to match feature points extracted from other images with different viewpoints, it is necessary to first define a reference value for each feature point. A block of 4 × 4 size (16 pixels) is defined and a total of 16 neighboring blocks are set as shown in FIG. 4 around one feature point [Non-Patent Document 2].

At this time, the feature point is located at the first pixel of the 11th block as shown in the blue part of FIG. (Difference between neighboring pixel values) in the horizontal direction and the vertical direction with respect to each pixel value in each block made up of 16 pixels of 4x4 size, and calculates the value of each pixel value The slope direction and the tilt magnitude of the tilt angle? In this case, as shown in FIG. 5, the slope direction of each pixel is quantized in a total of eight directions at intervals of 0 to 45 degrees, and the slope value is converted with a weight in the quantization process according to the original slope. Equation (1) shows a process in which the magnitude of the gradient of a pixel is divided into 0 direction and 45 degree direction components in the quantization process when the slope direction of the pixel is a value of 0 degrees and 45 degrees. [Non-Patent Document 2].

[Equation 1]

Where x _∠ is the direction of the slope of the pixel, H ₀ is the magnitude of the 0 ° direction of the slope of the pixel, and H ₄₅ is the magnitude of the 45 ° direction of the slope of the pixel.

As a result, as shown in FIG. 5, eight slope information for each block can be obtained by adding the slope and slope values of eight directions of 16 pixels in each block. Therefore, one feature point has slope information of eight directions for each of the sixteen neighboring blocks, and thus all of them are defined as descriptors of 128 (8x16) dimensions.

The feature points having the smallest euclidean distance difference between the descriptor information (slope information) of each feature point are matched. 6 shows the results of feature point matching using SIFT (Non-Patent Document 2). There are well-matched feature points, but there are also a few feature points with incorrect matching. In Fig. 6, the feature points matched with the blue line are correctly matched feature points, and the feature points matched with the red line show the misaligned feature points.

Next, the feature points are classified using homography (S30). That is, a homography matrix is obtained from a part of the pair of feature points obtained previously, and the feature points are classified using the homography. That is, the feature points are correctly matched.

To do this, feature points are classified using a homography method. By knowing the geometrical relationship between two images by the homography method, it is possible to accurately match the feature points and to remove the misaligned feature points in the feature point matching process using SIFT.

As shown in FIG. 7, the step S30 of classifying the feature points using the homography includes obtaining a first homography matrix using the pair of feature points (S31), using the first homography matrix, (S32), a step (S33) of obtaining a quadratic homography matrix using the classified feature point pairs, and a step (S34) of classifying the feature points using the quadratic homography matrix. In addition, the method further comprises the step (S35) of removing an outlier using the region of the classified feature points.

First, four pairs of feature points matched in two images having different viewpoints are selected, and a homography matrix H is obtained using the selected feature points (S31) [Non-Patent Document 18]. The homography matrix obtained at this time is referred to as a first homography matrix.

If the ratio of the misaligned feature points is less than 50%, a correct homography matrix can be obtained using the RANSAC [Non-Patent Document 13] method. Four pairs of randomly matched feature points are selected to obtain a homography matrix H [Non-Patent Document 18]. Multiply each of the minutiae obtained in one image by the matrix H, minutiae points of other corresponding images can be determined. Euclidean distance differences between feature points acquired using homography and feature points obtained using SIFT are calculated and added together. This process is repeated for other feature points to obtain a homography matrix H in which the sum of the distance differences is minimized, thereby defining a final homography matrix of two different images at different viewpoints.

Next, the feature points are classified using the homography matrix (or the first homography matrix) obtained previously. That is, the homogeneous feature point having a misaligned value has a value completely different from that of the original homography matrix. We use the features of this homography matrix.

In order to classify the correctly matched minutiae with the homography matrix H obtained by the RANSAC method, the transformed coordinate values of the matched minutiae are transformed through the homography matrix transformation of Equation (2).

&Quot; (2) "

Where [X ₁ , Y ₁ , 1] is the coordinate of one image and [X ₂ , Y ₂ , 1] is the coordinate of the matched minutiae of the other image transformed by the matrix H. The Euclidean distance difference between the transformed coordinates and the coordinates matched in the other images is calculated through Equation (3).

&Quot; (3) "

Here, X ₃ and Y ₃ represent the minutiae coordinates of other images matched through the SIFT method, and t represents an arbitrary threshold value. The smaller the threshold value, the smaller the distance difference between the two coordinates. FIG. 8 shows characteristic points (red dot, green line) classified through Equation 3 after calculating the homography matrix H using the matched minutiae obtained in FIG. The feature points in this case can be thought of as well-matched feature points in both images.

It can be seen that the classified minutiae are accurately aligned with the naked eye as shown in FIG. However, the feature points shown in Fig. 8 do not include all of the correctly matched feature points. This phenomenon is due to the fact that one homography matrix can find only the matching points of the feature points existing in the same plane in the image.

FIG. 9 (a) shows the x-axis values in the two-dimensional space when two points in the three-dimensional space are not parallel to each other and the points having a distance of 4 are placed on the plane with a difference of 120 degrees. In FIG. 9 (b), when the z-axis is rotated 60 degrees about the x-coordinate 20, the points located on the plane are changed in terms of the two-dimensional space. When rotated as shown in the drawing, it can be seen that the points located on the two planes have different x-coordinate values in terms of the two-dimensional space.

The table of FIG. 10 shows the values of the x-coordinate which change when the two planes existing in FIG. 9 (a) are rotated 60 degrees in the z-axis direction from the x-coordinate value 20. The first row of the table of FIG. 10 represents the x coordinate value of the red points present in FIG. 9 (a), and the second row represents the x coordinate value of the corresponding points after the rotation shown in FIG. 9 (b). The third row can be obtained by using Equation 4 as the difference of the x coordinate values changed due to the rotation, and the fourth row is defined as the second-order variation and can be obtained by using Equation 5.

&Quot; (4) "

&Quot; (5) "

Where X _Bn and X _An respectively represent the values of the n-th coordinate before rotation and the n-th coordinate after rotation, respectively. X _dn represents the amount of primary variation of the x coordinate with respect to the nth point, and X _ddn is defined as the nth secondary variation.

According to the table of FIG. 10, when the non-parallel planes are rotated in the three-dimensional space and the correlation of the coordinates of the points in the two-dimensional space is analyzed, the two planes about the x- It can be confirmed that they have different secondary variations. It is possible to calculate the secondary variation amount of the minutiae points by using the analysis result of the table in Fig. 10 and to check whether or not there is a plane which is not parallel.

In the present invention, one homography matrix H obtained by using the SIFT method and the RANSAC method is applied, and the coordinate of one image is transformed as shown in the table of FIG. 10 to see if there is a region where the secondary variation suddenly changes. Equations (6) and (7) show coordinates transformed through the homography matrix H of points of an image.

&Quot; (6) "

&Quot; (7) "

Where H is an arbitrary element [a, b, c, d, e, f, g, h, i] of the initial homography matrix H obtained through RANSAC. [X ₁ , Y ₁ , 1] is the coordinate of one image, and [X ₂ , Y ₂ , 1] is the coordinate transformed by the matrix H. Equations (8) to (15) show a process of obtaining the secondary variation with respect to the x-axis with the element value of the matrix H of Equation (6).

&Quot; (8) "

&Quot; (9) "

&Quot; (10) "

&Quot; (11) "

&Quot; (12) "

&Quot; (13) "

&Quot; (14) "

&Quot; (15) "

Where X _2n is the coordinate value of the n-th x-axis, X _dn is the primary variation of X _2n , and X _ddn is the secondary variation of X _2n . Assuming that the initial coordinates are (0, 0), X ₁₁ and Y ₁₁ are respectively 0, and assuming that the rotation is performed on the z-axis, the secondary variation for the x axis is finally expressed as shown in Equation (15). Since the values of the elements of the matrix H are constant in Equation (15), the value of the secondary variation only exists when it increases or decreases.

With this method, homography matrices corresponding to the respective planes can be obtained for a plurality of non-parallel planes existing in the image, and the matched minutiae can be classified.

When there are other planes, the method of finding the corresponding homography matrix is similar to the above-described method. A homography matrix for another plane is obtained by using RANSAC (Non-Patent Document 13) method with the remaining minutia except the minutiae accurately matched with the homography matrix of one plane previously.

In this case, there are things to consider. As mentioned earlier, RANSAC can effectively obtain a homography matrix when the ratio of mis-matched feature points is less than 50%. The feature points matched through the conventional SIFT method have both the matched feature points and the misaligned feature points. Since the correctly matched feature points are classified in the first place, the ratio of the misaligned feature points among the remaining feature points is increased. Therefore, it is necessary to remove the misaligned feature points and to keep the ratio less than 50% before classifying the feature points on other planes.

Next, a quadratic homography matrix is obtained using the classified pair of feature points (S33), and the feature points are classified again using the obtained quadratic homography matrix (S34). Since the second homography matrix is obtained from the more precisely matched minutiae, if the minutiae are classified by the second homogenization matrix, the matched minutiae can be more accurately classified than the first homogeneous matrix.

First, we set the area that contains the exact matching feature points. Fig. 11 shows a region (black rectangle) including the well-matched minutiae points of Fig. The area set in FIG. 11 is a rectangular area in which the minimum value and the maximum value of the x-axis and the y-axis are found among the classified minutiae, respectively.

In order to set the size of the set region to be similar to one plane including the region, the feature points accurately matched by the initial homography matrix (first homography matrix) are used. Since these feature points are contained in the same plane (parallel plane), it is possible to obtain a homography matrix (quadratic homography matrix) more accurate than a conventional homography matrix with these points. Based on this matrix, the feature points that are correctly matched with Equations 2 and 3 are re-classified to reset the region. Fig. 12 (a) shows an initially set area, and Fig. 12 (b) shows a reset area (black area).

After the region is reset, the missed feature point is found. As shown in FIG. 12 (a), it is possible to find a relatively large number of erroneously matched minutiae in the region extended as shown in FIG. 12 (b). FIG. 13 is a diagram showing the removed minutiae after resetting the area. The green line shows the correctly matched minutia, and the red line shows the misaligned minutiae.

Secondary classification of feature points by the second homography matrix performs classification operation with all the feature points present.

Next, an area of the secondarily classified minutiae is set, and an outlier is removed using the set area (S35). The method of setting the area sets the area using the minimum value and the maximum value of the x-axis and the y-axis among the classified minutiae. It is used to remove outliers by using the set area. The larger the size of the area, the more efficient it is to remove the anomalous points. Therefore, it is possible to set the area set with the feature points classified using the second-order homography for all the feature points to be larger than the set region having the feature points sorted first.

Specifically, the previously set region is used as a method of removing the anomalous points. In the two images, the feature points included in the region of one image must be matched with the feature points in the region of the other image. However, if it is matched with a feature point outside this region, it is removed as an ideal point.

Next, the minutiae classification step S30 is repeated with respect to the minutiae that are classified and left (S40). In this case, the feature points (or remaining feature points) used in the repetition are the feature points excluding the feature points obtained in the previous step in all feature points. Preferably, the feature points obtained in the previous step and the feature points removed by the anomalous points are all eliminated, and the remaining feature points are repeated. At this time, it is repeated until the number of remaining minutia points becomes less than the minimum threshold value.

Through the homography matrix, the process of finding the matching feature points existing on the different planes is repeated using the RANSAC method again with the feature points excluding the matching feature points and the misaligned feature points found using the region setting. Figure 14 shows the matched minutiae found further in this way.

By repeating the above method sufficiently, it is possible to classify the exact matching points on various planes present in the image. As the number of repetitions increases, the number of remaining feature points decreases, which reduces the reliability of the RANSAC basic condition (ratio of erroneously matched feature points). Therefore, if the number of remaining feature points falls below M, the iterative process is stopped. Fig. 15 (a) shows the total matched minutiae, and Fig. 15 (b) shows minutiae (red dots, green lines) finally classified as correctly matched through the method according to the present invention.

Next, the descriptor information of the classified minutiae is sorted (50).

In the method according to the present invention, the feature point classification process is performed on all reference images of a building through the above process. The multi-frame image 10 means an image acquired at a predetermined interval with respect to one building. That is, the images are acquired at regular intervals. So there are many or n reference images for a building.

The descriptor information of the classified minutiae may exist as many as the number of reference images in which the minutiae point exists, that is, the number of reference images from at least two. In this case, the information of the descriptor may be overlapped and the matching process with the query image feature point may increase in the case of the 128-dimensional descriptor as many as the number of the reference images including one feature point. This problem can be solved by arranging descriptor information that exists as many as the number of reference images in which one feature point exists as one corresponding descriptor information.

The following is a five-step process of organizing a plurality of descriptor information into one descriptor information.

Step 1: The initial descriptor value of the feature point is stored as the descriptor information of the feature point.

Step 2: If the descriptor of the feature point is stored, a threshold value of each dimension of 128 dimensions is set according to the stored descriptor information according to Equation (16).

&Quot; (16) "

Where x _d represents the value of the d-th dimension of the descriptor, and t _d represents the threshold value of the d-th dimension.

One feature point has a vector value of 128 dimensions. When the feature points are classified, they are classified through matching. Thus, we have several 128-dimensional descriptor vector values. Therefore, n represents the number of descriptors possessed by the feature point. That is, n means an index. Finally, the classified minutiae were classified through the matching process. Since one feature point is classified through matching, it can exist not only in one image but also in another image, and there can be a maximum of N. Therefore, it has N different 128-dimensional descriptor values of one feature point. N is the number of multi-view images of the multi-frame image. n means 1, 2, 3, ... n ..., N, and in the case of d, the descriptor has a vector value of 128 (dimension). x ⁿ _d denotes the nth descriptor of one feature point, and t ⁿ _d denotes a threshold value of the feature point set by Equation (16).

Step 3: If the information of the next inputted descriptor satisfies Equation (17), the value of the corresponding dimension (or the integrated value) is stored by Equation (18).

&Quot; (17) "

&Quot; (18) "

Here, x ⁿ _dk represents a value (or an integrated value) of the kth descriptor of the nth reference image feature point. d means a descriptor (descriptor), and in the _k d k denotes the dimension index of the descriptor 128-D (number) 1, 2, 3, ..., and 128. One feature point (matching feature point) has 1, 2, 3, ..., n, ..., N 128-dimensional descriptors. n + 1 integrates up to n descriptor values for a single feature point and represents the next incoming descriptor value.

Step 4: If the case of Equation 17 is not satisfied, both the current descriptor information and the information of the descriptor input are stored. If you already have two pieces of information, the two pieces of information are stored.

According to the condition of (17), x ⁿ ⁺ ¹ _dk If (n + 1-th in a descriptor that is input to the value of the k-th) is input to the n-th not be incorporated in to t ⁿ _d (threshold value) is integrated into one value compared to the stored descriptor value x ⁿ _dk (stored Descriptor value) and x ⁿ ⁺¹ _dk are stored. Then, when the value of the (n + 2) -th dimension is input into the 128-dimensional value, the stored k-th descriptor has two values (because the expression (17) is not satisfied). Therefore, the integration process is performed with three values including the k-th value of the (n + 2) -th descriptor and two values already stored.

Step 5: After proceeding through steps 1 to 4 for all 128 descriptor information for all reference images having the same feature points, the dimension having two pieces of information finally has a small difference .

Next, the effects of the present invention will be explained through experiments.

Experimental images are based on the Myongdong DB provided by ETRI. Experimental images consist of reference image and query image. The resolution of the image is 427 × 640 and 640 × 427. The experiment environment is implemented using Microsoft's Visual Studio C ++ 2010 and OpenCV 2.4.8 library from Microsoft and uses Intel i5 quad core processor at 3.40GHz. In the present invention, the existing SIFT method and the recall value are compared in order to grasp the performance of the matching result of the conventional method. The Recall value is defined as Equation 19 with a value indicating a precisely matched ratio in the entire matching pair.

&Quot; (19) "

Where N correspondences are the total number of finally matched pairs, and N correct matches are the exact number of matched minutiae when viewed visually.

FIG. 16 is a reference image of the building A, which is composed of 18 images in total, and is composed of images having different viewpoints at regular intervals. FIG. 17 shows 26 A building query images, which include an illumination change and an occlusion region different from the reference image.

FIG. 18 shows the minutiae classified by the proposed method in the reference image of the building A and the set area.

The table of FIG. 19 is a result of comparing the recall value of the method according to the present invention with the recall value of the SIFT for the reference image and query image of building A. The recall value when SIFT is applied is a result of averaging the result of matching with each reference image descriptor information when a query image and a corresponding feature point exist.

In the table of FIG. 19, the recall value of the table is generally low because the difference in brightness between the reference image and the query image is large. For both query images A (1) and A (12), where the recall value is high, the viewpoint is acquired in front of the building and there is a similar reference point. We can see that the recall value of SIFT and the recall value of the proposed method are small in case of query image A (9), A (10), A (13), A (14) In this case, the image obtained by looking at the side is the image obtained, and the brightness difference is severe.

19, it can be seen that the recall value of the method according to the present invention is 10 to 30% or more higher than the recall value obtained through SIFT-based feature point extraction as a whole.

In the present invention, a new method of classifying feature points and improving the matching performance using multi-frames in the image recognition process has been described. The method of extracting the feature points of the building based on the existing SIFT (scale invariant feature transform) is often inaccurate because the feature points are extracted from the surrounding environment other than the object to be recognized. In the method according to the present invention, a homography matrix is obtained by using different images at predetermined intervals, and a feature point is classified by applying RANSAC (random sample consensus) method. Since one homography matrix classifies only the feature points of one plane of the image, it classifies all the exact feature points in the image by repeating the process of finding the homography matrix with the remaining feature points. The recall value of the method according to the present invention can be improved over the recall value using SIFT.

Although the present invention has been described in detail with reference to the above embodiments, it is needless to say that the present invention is not limited to the above-described embodiments, and various modifications may be made without departing from the spirit of the present invention.

10: video 20: computer terminal
30: Program system

Claims

A feature point classification method for multi-frame based building recognition, which extracts feature points from a multi-frame image,
(a) extracting feature points from each image of the multi-frame image;
(b) performing matching between each of the extracted minutiae and acquiring a pair of matched minutiae;
(c) obtaining a homography matrix from a plurality of pairs of feature points, and classifying the feature points using the obtained homography matrix; And
(d) repeating the step (c) using the feature points remaining and classified,
The method further includes: (e) generating descriptors of each image as one descriptor for the obtained feature points,
In the step (e), when the k-th descriptor x ^{n + 1} _dk of the (n + 1) -th reference image feature point satisfies the following formula 1, Feature point classification method for multi - frame based building recognition.
[Equation 1]

[Equation 2]

The method according to claim 1,
In the step (a), characteristic points are extracted using a difference of gaussian (DOG) method of a scale invariant feature transform (SIFT).

The method according to claim 1,
In the step (b), the slope of the feature point with respect to the adjacent pixel is determined, the descriptor is defined as the slope information, and the feature points having the smallest sum of euclidean distance differences between the descriptors are matched A method for classifying feature points for multi - frame based building recognition.

The method according to claim 1,
In the step (c), for at least two matched minutiae points, one minutiae point (hereinafter, referred to as a first minutiae point) is converted into a homography matrix, and one minutiae point Wherein the Euclidean distance difference between the first feature point and the second feature point is determined, and if the calculated distance difference is smaller than the predetermined threshold value, the feature point is classified into the matching feature points.

The method according to claim 1,
In the step (c), a first homography matrix is obtained using the pair of feature points, the first homogeneous matrix is used to classify the feature points into a first order, and a second homography matrix And a second feature point is classified into all feature points by using a second homography matrix. The feature point classification method for multi-frame based building recognition.

6. The method of claim 5,
In step (c), the homography matrix H is selected by selecting four pairs of randomly matched feature points, and the Euclidean distance difference between the feature points obtained by converting the homography matrix H for all the feature points and the feature points matched is obtained And a homogenization matrix H in which a sum value is minimized is obtained, and a homogenization matrix H which becomes a minimum is selected as a first homography matrix.

6. The method of claim 5,
Wherein in the step (c), an area of the secondarily classified minutiae is set, and if the minutiae is matched with the minutiae outside the set area, the minutiae is removed as an outlier. .

8. The method of claim 7,
Wherein the classification step is repeated with the remaining minutiae except for the minutiae classified as the minutiae and the minutiae removed in the step (c) in the step (d).

The method according to claim 1,
And repeating the iterative process if the number of remaining minutiae points falls below a predetermined minimum number in step (d).

delete

The method according to claim 1,
If the k-th descriptor x ^{n + 1} _dk of the (n + 1) -th reference image feature point does not satisfy the following equation (1), the current integrated value x ⁿ _{dk of the} descriptor and the integrated Values are stored, and if both of the two descriptor information are stored and when the next reference image feature point k th descriptors are merged, if the two pieces of information are all stored, the three pieces of information of the stored two pieces of information and the current descriptor information And stores the information of which the difference is small.

A computer-readable recording medium on which a program for performing a feature point classification method for multi-frame based building recognition according to any one of claims 1 to 9 and 12 is recorded.