CN115620036A - Image feature matching method based on content perception - Google Patents

Image feature matching method based on content perception Download PDF

Info

Publication number
CN115620036A
CN115620036A CN202211232715.0A CN202211232715A CN115620036A CN 115620036 A CN115620036 A CN 115620036A CN 202211232715 A CN202211232715 A CN 202211232715A CN 115620036 A CN115620036 A CN 115620036A
Authority
CN
China
Prior art keywords
feature
image
content
stage
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211232715.0A
Other languages
Chinese (zh)
Inventor
李佐勇
王伟策
许惠亮
刘伟霞
赖桃桃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minjiang University
Original Assignee
Minjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minjiang University filed Critical Minjiang University
Priority to CN202211232715.0A priority Critical patent/CN115620036A/en
Publication of CN115620036A publication Critical patent/CN115620036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention relates to an image feature matching method based on content perception. Firstly, an improved two-stage feature matching method is provided, and a most advanced model fitting method is used for pre-aligning an image pair in the first stage; the pre-aligned image is used as input for the second stage; secondly, a block consisting of a complete convolution network and a mask predictor is used in front of a feature extractor to weight the features of the input image so as to enhance the extraction of local effective features; the method improves the matching accuracy.

Description

Image feature matching method based on content perception
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image feature matching method based on content perception.
Background
Feature matching refers to finding the correct correspondence between two images, as shown in fig. 1. It is also the basis for higher level tasks in the field of computer vision (e.g. three-dimensional reconstruction, image stitching, SLAM and lane line detection, etc.), and improving the probability of a correct match can enable these higher level tasks to be better developed.
Classical feature matching methods typically comprise three steps: feature detection, feature description, and feature matching. Before the advent of deep learning methods, most of them were based on the above-described procedure. These methods generally improve performance by improving one step in the flow. For example, sadder detector [1], FAST detector [2] based on luminance comparison and its extended version FAST-ER [3], which improve the performance of the feature detection step. Other methods [4,5] focus on improving the characterization step. In addition, the famous traditional methods of SIFT [6], SURF [7], KAZE [8], AKAZE [9], etc. improve both the first two steps. For the last step, classical model fitting methods such as RANSAC [10] and its modified algorithm DSAC [11] improve the matching accuracy by estimating geometric transformations (e.g., polar geometry and homography).
With the advent of deep learning methods, the feature matching methods proposed in recent years have widely used neural networks to improve performance. Some of these learning-based methods [12,13] follow classical flow, while others [14,15] belong to the end-to-end approach. SuperPoint [12] jointly detects key points and computes the associated descriptors. SuperGlue [13] improves cross-and self-attention-based descriptors using graph neural networks. In different scenarios, improving the performance of only one or two steps of the method is not the best choice, and therefore an end-to-end method is proposed. D2-Net [14] uses pre-trained VGG-16 as a feature extractor to obtain features. DFM [15] uses pre-trained VGG-19 as a feature extractor to obtain depth features while pre-aligning the input images prior to matching to improve algorithm performance. However, the number of feature points extracted by D2-Net and DFM is insufficient, resulting in a small number of correct matches on non-planar images. In addition, the geometric estimation algorithm used as pre-alignment in DFM is not efficient enough, and the final matching accuracy is also affected.
Disclosure of Invention
The invention aims to provide an image feature matching method based on content perception, which improves the matching accuracy.
In order to achieve the purpose, the technical scheme of the invention is as follows: an image feature matching method based on content perception is characterized in that firstly, an improved two-stage feature matching method is provided, a most advanced model fitting method is used for pre-aligning an image pair in a first stage, and the pre-aligned image is used as input of a second stage; second, a block of a complete convolution network and a mask predictor is used prior to the feature extractor to weight the features of the input image.
Compared with the prior art, the invention has the following beneficial effects: in order to improve the precision of an end-to-end feature matching method, particularly the precision when the method is applied to challenging scenes such as non-planar images, repeated images or high-light-intensity change images, the invention provides an image feature matching method based on content perception. The method of the invention firstly uses the most advanced model fitting method to pre-align the input images, improves the quality of feature extraction, and takes the aligned images as the input of the second stage. Secondly, a content perception block is added into the feature extractor, a probability graph is predicted, the effective part of the image is highlighted, feature extraction is guided, and a larger number of effective features are extracted. Experiments show that the accuracy of the method on the HPatches data set exceeds the best traditional method and deep learning method at present.
Drawings
Fig. 1 is an example of image feature matching.
FIG. 2 is a flow chart of the method of the present invention.
FIG. 3 is a comparison of image alignment effects of two homography estimation algorithms: (a) and (b) are input images, where (a) is a target image, (c) is a result of alignment using magsa + +, and (d) is a result of alignment using RANSAC.
Fig. 4 is a content aware block diagram.
Fig. 5 shows MMA evaluation results on HPatches datasets for 9 feature matching methods at different ratios: the method comprises three scenes, namely Illumination change (Illumination), visual angle change (Viewpoint) and all (overhead).
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention relates to an image feature matching method based on content perception, which comprises the steps of firstly, providing an improved two-stage feature matching method, using the most advanced model fitting method to pre-align an image pair in the first stage, and using the pre-aligned image as the input of the second stage; second, a block of complete convolution networks and mask predictors is used to weight the features of the input image before the feature extractor.
The following is a specific implementation process of the present invention.
The invention relates to an image feature matching method based on content perception, which is shown in figure 2. The first stage comprises three steps. First, a pre-trained feature extractor (VGG-19) is used to extract features from an input image I A And I B Extracting features; secondly, performing initial matching on the feature map of the last layer by using Dense Nearest Neighbor Search (DNNS); finally, these initial matches are used for homography matrix estimation for pre-alignment. The second stage comprises two steps. Firstly, inputting a pre-alignment result into a feature extractor, wherein the feature extractor consists of a content perception block and VGG-19 and is used for extracting features; second, DNNS is used for feature matching. The first highlight of the method is to use a more robust model fitting algorithm to obtain a more accurate homography matrix for image alignment. The second highlight is that the content perception block is used for predicting a probability map to guide feature extraction. Compared with the existing feature extraction method, the method effectively enhances the extraction of local effective features.
1. Image pre-alignment based on model fitting algorithm
Use with existing methodsThe method uses a recently proposed robust model fitting method MAGSAC + + to estimate the homography matrix H BA ,H BA For warping images I B Obtaining an image I Bwarped (as shown in fig. 2). The method uses MAGSAC + + to more efficiently align two images and then find a more accurate match. At this stage, as shown in fig. 3 (c), the magscac + + algorithm achieves better results than RANSAC in the homography estimation task.
2. Content aware blocks
Feature-based matching methods are generally able to achieve satisfactory performance on popular data sets, but they are highly dependent on the number and quality of features. When the image includes a challenging scene such as a non-planar image, a repeating image, or an illumination-varying image, performance may be degraded due to an insufficient number of active features. Therefore, more efficient features are required for feature matching.
In order to solve the above problem, the method adds a content sensing block before the VGG-19. The content-aware block is composed of a feature extractor and a mask predictor, which can improve the quantity and quality of useful features. The feature extractor is used for preliminarily extracting a feature map of the input image. The mask predictor is used to predict the probability map, i.e. the locations where more significant content has a higher probability, which is then used to weight the feature map of the input image. The content aware block is shown in fig. 4.
A feature extractor: in order to enable the network to learn the deep features of an image pair autonomously, we use a fully convolutional network to form a feature extractor f (-) whose structural details are shown in table 1. It accepts an input of size H x W1 and generates a signature of size H x W C. For input image I A And I B Feature extractor sharing weights, generating a feature graph F A And F B I.e. by
F i =f(I i ),i∈{A,B}
A mask predictor: the method establishes a network to automatically learn the positions of the effective features, and the detailed structure is shown in table 2. The network m (-) goesAnd forming an interior point probability graph, and highlighting the positions with more contributions in the characteristic graph. Size and feature of probability map F A And F B Is further weighted by the probability map, and then two weighted feature maps G are used A And G B Input VGG-19, i.e.
M i =m(I i ),G i =F i M i ,i∈{A,B}
TABLE 1 feature extractor architecture
Figure BDA0003881672390000031
TABLE 2 mask predictor Structure
Figure BDA0003881672390000032
Figure BDA0003881672390000041
We typically evaluate the feature matching task on a sequence of images based on illumination and perspective changes. We are at HPatches [16] The method of the invention was examined on a data set comprising 116 sets of images, each set comprising 6 images of the same scene, taken at different viewing angles or under different lighting conditions, including planar and non-planar scenes. At the same time, each set of images also includes a homography matrix as a label. In the experiment, we compared with the classical algorithm SIFT [6] 、SURF [7] 、ORB [17] 、KAZE [8] 、AKAZE [9] And deep learning-based algorithm SuperPoint [12] 、Patch2Pix [18] And DFM [15] A comparison is made. At the same time, we also deleted the content aware block (without C-se:Sub>A) in this method to verify the impact of this module. In the experiments, the performance of each method was measured using the average Match Accuracy (MMA) which describes the correct Match characteristicsMean values of percent characteristic (i.e., interior points) over the entire data set. If the value of the reprojection error (obtained from the tag homography matrix and the match calculation) is less than a given threshold, the match is considered an inlier.
We performed two experiments to measure the effectiveness of the proposed method. (1) All comparative methods used mutual nearest neighbor search and bi-directional ratio tests to find the correct match, measuring the MMA at different ratios from 0.1 to 1.0 in steps of 0.1. (2) For each method, the ratio at which the best performance is obtained is fixed, and the MMA of all methods at pixel threshold 1,3,5, 10 is compared.
Fig. 5 shows MMA on HPatches data set for 9 feature matching methods at different ratios. It can be observed from FIG. 5 that the curves for all comparative methods change significantly as the ratio changes. This indicates that the change in ratio has a more significant effect on the other methods, whereas the method of the present invention is less affected.
Table 3 lists the MMA of each method at different pixel thresholds, from which it can be seen that the method of the present invention is very competitive. Compared to other methods (SIFT, SURF, ORB, KAZE, AKAZE, superPoint, patch2Pix, and DFM), the method has the highest accuracy at any pixel threshold. At a pixel threshold of 1, the MMA of this method is significantly higher than that of the suboptimal method. When the threshold is set to 5, the MMA of the method is equal to that obtained by Patch2 Pix. When the thresholds are set to 1,3,5 and 10, the MMA of the method exceeds the end-to-end method DFM 0.19, 0.06, 0.03 and 0.01, respectively. At a threshold of 1, SIFT reaches suboptimal performance (0.60). With a threshold of 3, the less preferred method is Patch2Pix (0.88). When the threshold is 10, the suboptimal methods are Patch2Pix and DFM (0.96).
As shown in table 3, without content aware blocks, the method also shows superiority, where the MMA is 0.07 higher than the next best method (SIFT) at a pixel threshold of 1. When the pixel thresholds are set to 3 and 5, the MMA of the method is on par with the suboptimal method. The advantages of the inventive method are further extended after the introduction of the content-aware block. Thus, the content aware block may effectively improve the performance of the method.
TABLE 3 optimal MMA for different image matching algorithms (optimal values are indicated by bold font)
Figure BDA0003881672390000051
Reference:
[1]Aldana-Iuit J,D Mishkin,Chum O,et al.In the Saddle:Chasing Fast and Repeatable Features[C].Proceedings of the IEEE International Conference on Pattern Recognition,2016:675-680.
[2]Miroslav,Trajkovi,and,et al.Fast corner detection[J].Image and Vision Computing,1998,16(2):75-87.
[3]Rosten,Edward,Porter,et al.Faster and Better:A Machine Learning Approach to Corner Detection[J].IEEE Transactions on PatternAnalysis&Machine Intelligence,2008,32(1):105-119.
[4]Gong Y,Kumar S,Rowley H A,et al.Learning Binary Codes for High-Dimensional Data Using Bilinear Projections[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.,2013:484-491.
[5]Trzcinski T,Lepetit V.Efficient discriminative projections for compact binary descriptors[C].European Conference on Computer Vision.Springer,Berlin,Heidelberg,2012:228-242.
[6]Lowe D G.Distinctive image features from scale-invariant keypoints[J].International Journal ofComputer Vision,2004,60(2):91-110.
[7]Bay H,Tuytelaars T,Gool L V.SURF:Speeded up robust features[C].European Conference on Computer Vision.Springer,Berlin,Heidelberg,2006:404-417.
[8]Alcantarilla P F,Bartoli A,Davison A J.KAZE features[C].European Conference on Computer Vision.Springer,Berlin,Heidelberg,2012:214-227.
[9]Alcantarilla P F,Solutions T.Fast explicit diffusion for accelerated features in nonlinear scale spaces[J].IEEE Transactions on PatternAnalysis&Machine Intelligence,2011,34(7):1281-1298.
[10]Fischler M A,Bolles R C.Random Sample Consensus:A paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM,1981,24(6):381-395.
[11]Brachmann E,Krull A,Nowozin S,et al.DSAC-Differentiable RANSAC for camera localization[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6684-6692.
[12]DeTone D,Malisiewicz T,Rabinovich A.SuperPoint:Self-supervised interest point detection and description[C].Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:224-236.
[13]Sarlin P E,DeTone D,Malisiewicz T,et al.SuperGlue:Learning feature matching with graph neural networks[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4938-4947.
[14]Dusmanu M,Rocco I,Pajdla T,et al.D2-Net:A trainable CNN for joint description and detection of local features[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:8092-8101.
[15]Efe U,Ince K G,Alatan A.DFM:A performance baseline for deep feature matching[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4284-4293.
[16]Balntas V,Lenc K,Vedaldi A,et al.HPatches:A benchmark and evaluation of handcrafted and learned local descriptors[C].Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition.2017:5173-5182.
[17]Rublee E,Rabaud V,Konolige K,et al.ORB:An efficient alternative to SIFT or SURF[C].International Conference on Computer Vision.IEEE,2011:2564-2571.
[18]Zhou Q,Sattler T,Leal-Taixe L.Patch2Pix:Epipolar-guided pixel-level correspondences[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4669-4678.。
the above are preferred embodiments of the present invention, and all changes made according to the technical solutions of the present invention that produce functional effects do not exceed the scope of the technical solutions of the present invention belong to the protection scope of the present invention.

Claims (7)

1. An image feature matching method based on content perception is characterized in that firstly, an improved two-stage feature matching method is provided, a most advanced model fitting method is used for pre-aligning an image pair in a first stage, and the pre-aligned image is used as input of a second stage; second, a block of complete convolution networks and mask predictors is used to weight the features of the input image before the feature extractor.
2. The image feature matching method based on content awareness according to claim 1, wherein the first stage is implemented specifically as follows: first, a pre-trained feature extractor VGG-19 is used to extract features from an input image I A And I B Extracting features; secondly, performing initial matching on the feature map of the last layer by using dense nearest neighbor searching DNNS; finally, the initial match is used for homography matrix estimation for pre-alignment.
3. The image feature matching method based on content awareness according to claim 2, wherein the second stage is specifically implemented as follows: firstly, inputting a pre-alignment result into a feature extractor consisting of a content sensing block and VGG-19 for extracting features; second, feature matching is performed using a dense nearest neighbor search DNNS.
4. The image feature matching method based on content perception according to claim 2, wherein the homography matrix H is estimated using a robust model fitting method magscac ++ BA ,H BA For warping images I B Obtaining an image I Bwarped
5. The method as claimed in claim 3, wherein the content-aware block is composed of a second feature extractor for preliminarily extracting a feature map of the input image and a mask predictor for predicting a probability map that more locations of the effective content have higher probability, and then the probability map is used to weight the feature map of the input image.
6. The method of content-aware-based image feature matching according to claim 5, wherein the second feature extractor is formed using a complete convolution network that accepts an input of size H W1 and generates a feature map of size H W C; for an input image I A And I B The second feature extractor shares the weight to generate a feature map F A And F B Namely:
F i =f(I i ),i∈{A,B}
where f (-) represents the second feature extractor.
7. The method of claim 6, wherein the mask predictor automatically learns the locations of valid features by building a network, which m (-) generates an interior point probability map highlighting more contributing locations in the feature map; size and feature of probability map F A And F B Is further weighted by the probability map, and then two weighted feature maps G are used A And G B Input VGG-19, namely:
M i =m(I i ),G i =F i M I ,i∈{A,B}。
CN202211232715.0A 2022-10-10 2022-10-10 Image feature matching method based on content perception Pending CN115620036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211232715.0A CN115620036A (en) 2022-10-10 2022-10-10 Image feature matching method based on content perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211232715.0A CN115620036A (en) 2022-10-10 2022-10-10 Image feature matching method based on content perception

Publications (1)

Publication Number Publication Date
CN115620036A true CN115620036A (en) 2023-01-17

Family

ID=84860846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211232715.0A Pending CN115620036A (en) 2022-10-10 2022-10-10 Image feature matching method based on content perception

Country Status (1)

Country Link
CN (1) CN115620036A (en)

Similar Documents

Publication Publication Date Title
Li et al. Contour knowledge transfer for salient object detection
Bi et al. Fast copy-move forgery detection using local bidirectional coherency error refinement
CN108960211B (en) Multi-target human body posture detection method and system
CN113361542B (en) Local feature extraction method based on deep learning
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN108537832B (en) Image registration method and image processing system based on local invariant gray feature
CN107862680A (en) A kind of target following optimization method based on correlation filter
KR101753360B1 (en) A feature matching method which is robust to the viewpoint change
CN111310690B (en) Forest fire recognition method and device based on CN and three-channel capsule network
CN112364881B (en) Advanced sampling consistency image matching method
Lecca et al. Comprehensive evaluation of image enhancement for unsupervised image description and matching
Huang et al. Robust simultaneous localization and mapping in low‐light environment
CN115294371B (en) Complementary feature reliable description and matching method based on deep learning
CN115620036A (en) Image feature matching method based on content perception
CN116188535A (en) Video tracking method, device, equipment and storage medium based on optical flow estimation
CN111612800B (en) Ship image retrieval method, computer-readable storage medium and equipment
CN110222217B (en) Shoe print image retrieval method based on segmented weighting
He et al. DarkFeat: noise-robust feature detector and descriptor for extremely low-light RAW images
Yang et al. Exposing photographic splicing by detecting the inconsistencies in shadows
CN110070110B (en) Adaptive threshold image matching method
CN113610016A (en) Training method, system, equipment and storage medium of video frame feature extraction model
Shen et al. A detector-oblivious multi-arm network for keypoint matching
Qiu et al. Adaptive threshold based SIFT image registration algorithm
CN117541764B (en) Image stitching method, electronic equipment and storage medium
Wu et al. Feature rectification and enhancement for no-reference image quality assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination