CN111985502A

CN111985502A - Multi-mode image feature matching method with scale invariance and rotation invariance

Info

Publication number: CN111985502A
Application number: CN202010765205.4A
Authority: CN
Inventors: 李加元; 胡庆武; 艾明耀; 赵鹏程
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2020-11-24

Abstract

The invention discloses a multi-modal image feature matching method with scale and rotation invariance, which comprises the following steps: 1) constructing a nonlinear rapid explicit diffusion scale space, and performing multi-scale transformation on a reference image and an image to be matched to respectively obtain images under each scale; 2) respectively extracting the characteristic points of each layer of image in the scale space for the reference image and the image to be matched; 3) performing log-Gabor wavelet function convolution on the reference image and the image to be matched to obtain a multi-scale and multi-direction convolution sequence, and constructing a maximum value index map based on the convolution sequence; 4) constructing a Log-Gabor annular convolution sequence to realize the rotation invariance of feature matching; 5) constructing a feature description vector of the feature points based on the maximum index map; 6) and constructing a feature point matching relation according to the feature description vector, and performing gross error elimination. The special design and combination between the feature selection and the feature description of the method can quickly, accurately and reliably realize the feature matching of the multi-modal image.

Description

Multi-mode image feature matching method with scale invariance and rotation invariance

Technical Field

The invention relates to a mapping remote sensing technology, in particular to a multi-modal image feature matching method with scale and rotation invariance.

Background

In the emergency of various serious disasters in recent years, by virtue of the advantages of flexibility, multiple types and the like, photogrammetry and remote sensing technology can obtain high-resolution images and topographic maps of disaster areas at the first time, and powerful support is provided for various aspects such as rescue and relief, facility construction, city planning and the like. Among its many applications, image matching is a core fundamental process, which is a prerequisite and guarantee for the implementation of these applications. The relevant applications based on matching include not only aerial triangulation in the photogrammetry field, but also visual navigation in the positioning navigation field, path planning in the robot field, target tracking in the intelligent transportation industry, and the like. The development of image matching problems can effectively promote the research progress of important problems in the related fields.

Feature matching refers to a process of collectively detecting reliable homonymous relationships from images of the same scene acquired at different times, from different sensors, or from different perspectives. However, feature matching is inherently an ill-defined problem, severely affected by various radiation factors. Classical feature matching methods can still handle linearly varying radiation differences, whereas they are essentially not able to handle non-linear radiation differences. These images with large non-linear radiation diversity are referred to as multi-modal images. The traditional classical feature matching method usually utilizes gray scale information or gradient information for feature detection and feature description. However, both gray scale information and gradient information are very sensitive to nonlinear radiation difference, so that processing the multi-modal image matching problem is a bottleneck problem of the feature matching method.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a multi-modal image feature matching method with scale and rotation invariance, aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: a multi-modal image feature matching method with scale and rotation invariance comprises the following steps:

1) constructing a nonlinear rapid explicit diffusion (FED) scale space, and performing multi-scale transformation on a reference image and an image to be matched to respectively obtain images under various scales; the reference image and the image to be matched are images of two different modes;

2) respectively extracting the characteristic points of each layer of image in the scale space for the reference image and the image to be matched; calculating two-dimensional phase consistency image layers of images of all layers in a scale space, and extracting image feature points according to maximum moment and minimum moment components of phase consistency;

3) performing log-Gabor wavelet function convolution on the reference image and the image to be matched to obtain a multi-scale and multi-direction convolution sequence, and constructing a maximum value index map based on the convolution sequence;

4) constructing a Log-Gabor annular convolution sequence to realize the rotation invariance of feature matching;

5) constructing feature descriptors of the feature points based on the maximum value index graph;

6) and constructing a feature point matching relation and carrying out gross error elimination.

According to the scheme, the nonlinear rapid explicit diffusion scale space is constructed in the step 1), and the method specifically comprises the following steps:

1.1) determining pyramid image parameters of a nonlinear scale space, wherein the nonlinear scale space is composed of M groups of pyramid images, each group has an S layer, and the scale parameters, the image group and the sub-level levels meet the following relations:

σ_i(m,s)＝σ ₀2^m+s/S,m∈[0,…,M-1],s∈[0,…,S-1],i∈[0,…,K-1](1) in the formula, σ₀As the initial scale, K is the number of scale space images.

1.2) repeatedly down-sampling the original image to obtain M groups of pyramid images, and solving a nonlinear diffusion filtering equation for each group of images according to different scale parameters by adopting a rapid explicit diffusion numerical analysis frame to generate a scale space image layer; the mathematical expression of the nonlinear diffusion filter equation is as follows:

Lⁱ⁺¹＝(I+τA(Lⁱ))Lⁱ (2)

wherein I is a unit matrix, LⁱThe image is an ith layer of scale space image, A is an image conduction matrix, and tau is a time step.

According to the scheme, the image feature point extraction is carried out according to the maximum moment and the minimum moment components of the phase consistency in the step 2), and the method specifically comprises the following steps:

2.1) obtaining a corner rate response value image layer through a minimum moment function of phase consistency, and carrying out extreme point detection to obtain corner features;

2.2) obtaining an edge information response value layer through a maximum moment function of phase consistency, and obtaining edge point characteristics by adopting a FAST detection operator;

and 2.3) rejecting the repeated features and the unstable feature points with response values lower than a threshold value to finish feature point extraction.

According to the scheme, the maximum value index map is constructed in the step 3), and the method specifically comprises the following steps:

3.1) carrying out Log-Gabor wavelet function convolution on the reference image;

3.2) calculating the wavelet transformation amplitude A with the scale n and the direction o_no(x, y) and summing along the dimension direction to obtain Log-Gabor convolution layer A_o(x,y)；

3.3) arranging the Log-Gabor convolution layers in sequence to obtain a Log-Gabor convolution sequence, namely a multichannel Log-Gabor convolution chart

Wherein, O is the number of wavelet directions, and the superscript ω is 1, 2.

3.4) for each pixel position in the reference image and the image to be matched, obtaining the pixel value corresponding to each channel of the log-Gabor convolution sequence to obtain an O-dimensional ordered array, and solving the maximum value of the O-dimensional ordered array and the omega of the channel where the maximum value is located_maxWill be ω_maxIs used as the maximum value index map constructed by the index values of the same pixel position.

According to the scheme, the feature description vector of the feature point is constructed based on the maximum index map in the step 4), which is specifically as follows:

4.1) for each feature point, selecting a local image block with the size of J multiplied by J pixels by taking the feature point position as the center on the maximum index map;

4.2) assigning a weight to each pixel;

4.3) dividing the local image block into sub-regions, and counting the histogram distribution characteristics in each sub-region;

and 4.4) sequentially connecting all the histogram vectors to form a feature description vector of the feature point and performing truncation and normalization processing on the feature vector.

According to the scheme, the local image block of the feature point in the step 4) is selected, and the size of the image block is 96 multiplied by 96 pixels; weighting each pixel by adopting a Gaussian function, wherein the standard deviation of the Gaussian function is 48; and dividing the sub-region into 6 multiplied by 6 sub-regions, counting the histogram distribution characteristics in each sub-region, wherein the histogram column number (namely the number of channels of the log-Gabor convolution sequence) is 6.

According to the scheme, in the step 5), a convolution sequence A and a convolution sequence B are respectively constructed for the reference image and the image to be matched, and the maximum value index Map of the convolution sequence A of the reference image is obtained^ASequentially transforming the initial layers of the convolution sequences to obtain O convolution sequences with different initial layers for the convolution sequence B of the image to be matched; then, calculating the maximum index map of each convolution sequence to obtain a maximum index map set

Finally, calculating the characteristic main direction of each characteristic point based on the histogram main direction method, and determining a maximum value index map set

Map of medium and maximum index^AMatching maximum index Map^B。

According to the scheme, in the step 6), the characteristic point matching relationship is established, and the gross error elimination is carried out by establishing the characteristic point matching relationship by using a nearest neighbor search method and realizing the gross error and noise elimination by using a RANSAC method and a q norm method.

According to the scheme, in the step 6), a feature point matching relationship is constructed, and gross error elimination is performed, specifically as follows:

6.1) calculating the nearest Euclidean distance and the next nearest Euclidean distance of the feature vector, firstly constructing a one-to-one corresponding matching relation according to the nearest distance, and then preliminarily eliminating mismatching points by utilizing the ratio of the nearest distance to the next nearest distance;

6.2) eliminating gross error points by adopting an RANSAC algorithm based on an affine transformation model;

6.3) calculating the overall optimal solution of the affine transformation model between the image pairs based on the q-norm, and further removing gross error and low-precision noise points based on the model.

The invention has the following beneficial effects:

1. the invention provides a maximum value index map-based anti-radiation distortion image feature matching method with scale and rotation invariance, which adopts phase consistency to replace pixel values or gradient values for feature detection, and overcomes the limitation that the traditional descriptor based on gradient information is sensitive to nonlinear gray level difference. The method is designed from two aspects of feature detection and feature description, the scale and radiation invariance of feature detection are guaranteed by the nonlinear scale space and phase consistency, and the rotation and radiation invariance of the descriptor are guaranteed by the feature description based on the maximum value index map. The method can not solve the problem of feature matching of the multi-modal image only from the aspect of feature detection or feature description, and the method ensures high reliability and advancement of the method about special design and combination between feature selection and feature description of the multi-modal image.

2. The method can more quickly, accurately and steadily realize the characteristic matching of the multi-modal image, does not need to rely on any geometric geography prior information, is not only suitable for common homologous image data, but also suitable for heterogenous multi-modal image data, and is simultaneously suitable for matching of optical images and optical images, optical images and infrared images, optical images and SAR images, optical images and laser point cloud depth maps, optical images and noctilucent images and optical images and map data.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of a feature extraction result in an embodiment of the present invention;

FIG. 3 is a schematic diagram of the construction of a maximum value index map according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a Log-Gabor convolution loop structure according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an example of an experimental result of the multi-modal matching method in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a multi-modal image feature matching method with scale and rotation invariance includes the following steps:

step 1) constructing a pyramid scale space based on a nonlinear rapid explicit diffusion (FED) function.

Firstly, pyramid image parameters of a scale space are calculated, wherein the pyramid image parameters comprise the number of pyramid image groups, the number of image layers of each group and the like.

The nonlinear scale space is composed of M groups of pyramid images, each group has S layers, and the scale parameters, the image groups and the sub-layer levels satisfy the following relations:

σ_i(m,s)＝σ ₀2^m+s/S,m∈[0,…,M-1],s∈[0,…,S-1],i∈[0,…,K-1] (1)

in the formula, σ₀As the initial scale, K is the number of scale space images.

And then, a pyramid image group is obtained through down-sampling, and a scale space image layer is generated by adopting FED filtering.

Repeatedly down-sampling the original image to obtain M groups of pyramid images, solving a Nonlinear Diffusion Filtering equation (Nonlinear Diffusion Filtering) for each group of images according to different scale parameters by adopting a Fast Explicit Diffusion (FED) numerical analysis frame, and generating a scale space image layer. The mathematical expression of FED nonlinear diffusion filtering is as follows:

Lⁱ⁺¹＝(I+τA(Lⁱ))Lⁱ (2)

And 2) acquiring a phase consistency layer of the multi-mode image based on a phase consistency measurement calculation formula, and extracting the characteristics of the angular points and the edge points based on the layer.

Firstly, calculating a minimum moment angular point rate response value of a phase consistency layer, searching for an extreme point through neighborhood comparison, and performing non-maximum suppression processing to obtain reliable angular point characteristics. In this embodiment, 8-neighborhood extremum detection is performed based on the minimum moment response value.

And then, calculating a maximum moment edge information response value of the phase consistency layer, and obtaining edge point characteristics by adopting a FAST detection operator. In this example, the FAST threshold is 0.1.

And finally, because the characteristics of the corner points and the edge points possibly have repeated points, performing repeated characteristic point elimination processing, screening characteristic points with high response rate through a threshold value, and discarding unreliable characteristics with low response values. Fig. 2 shows exemplary results of feature extraction of the method of the present invention, in which fig. 2(a) is an original image, fig. 2(b) is a phase consistency map layer, and fig. 2(c) is feature points.

In the step, the phase consistency is adopted to replace pixel values or gradient values to carry out feature detection, the corner feature and the edge point feature are integrated, and meanwhile, the feature number and the repeatability are considered.

And 3) carrying out convolution operation on the log-Gabor wavelet function and the multi-modal image, constructing a maximum value index map based on a convolution result, and carrying out feature description on the maximum value index map instead of the gradient map. The flow of maximum value index map construction is shown in FIG. 3, first, from log-Gabor obtaining multi-scale and multi-direction convolution layer by wavelet function, summing along the scale direction to obtain log-Gabor convolution layer, and arranging in sequence to obtain convolution sequence. Then, the pixel value of the maximum value index map is obtained. For each pixel position in the maximum value index graph, an O-dimension ordered array is obtained, and the maximum value of the O-dimension ordered array and the channel omega where the maximum value is located are obtained_maxWill be ω_maxThe value of (c) is used as the corresponding value of the constructed maximum value index map.

In this embodiment, the Log-Gabor convolution obtains a convolution sequence, where the number of scales is 4 and the number of directions is 6. Arranging the convolutional layers in sequence to obtain a Log-Gabor convolution sequence, namely a 6-channel Log-Gabor convolution diagram

Obtaining the maximum value of the 6-dimensional array and the channel omega where the maximum value is_max，ω_maxThe value range is 1-6.

And 4), replacing the gradient map of the traditional feature description method with the maximum value index map, and constructing a feature vector by adopting a histogram statistical method to realize digital description of the feature points.

First, a local image block of 96 × 96 pixels centered on a feature point is selected from the maximum value index map, and the pixels in the image block are weighted by a gaussian function, with the closer the feature point, the greater the pixel weight.

Then, the local image blocks are subjected to blocking processing to obtain 6 × 6 sub-regions, histogram vectors of each sub-region are counted, and 216-dimensional feature vectors are obtained through sequential connection.

And finally, normalizing the feature vector, assigning the feature vector value larger than 0.2 to be 0.2, and normalizing again.

And 5), constructing an annular log-Gabor convolution sequence to realize the rotation invariance of multi-modal image matching. The maximum value index map constructed is completely different due to the different starting layers of the log-Gabor convolution sequence. Therefore, the log-Gabor convolution sequence layer is first connected end to construct a ring structure, as shown in fig. 4, where fig. 4(a) is the original image convolution sequence a and fig. 4(B) is the rotated image convolution sequence B. Then, 1 convolution sequence is constructed for the reference image, and O convolution sequences are constructed for the image to be matched. Therefore, the reference image may generate 1 maximum value index map, and the image to be matched may generate O maximum value index map sets. For each maximum index map within the set, a feature point description is used.

And 6) obtaining a feature matching relation based on the feature description vector, and rejecting an error matching point pair and a low-precision noise point pair.

Firstly, constructing a K-D tree for a feature point set in a reference image, acquiring two nearest distance feature points of each reference image feature point on an image to be matched by a nearest neighbor search method, taking the nearest distance points as matching points of the features, and preliminarily eliminating mismatching points by the ratio of the nearest distance to a next nearest distance, namely, points with the ratio more than 0.9 are eliminated as unreliable matching pairs.

Then, an affine transformation model is selected to simulate a geometric relation model between multi-mode image pairs, and a RANSAC algorithm is adopted to remove rough difference points. Specifically, a minimum subset is randomly selected from the initial matching pair set, and an affine transformation model among matching points is calculated; and applying the calculated model to the residual matching points, and taking the matching points with the statistical residual error smaller than a threshold value as a correct set. Repeating the above processes for multiple times, and selecting the set with the maximum number of matching points in the correct set as the final matching point set.

And finally, introducing a q-norm to construct an error rejection cost function, solving a global optimal solution of the affine transformation model by an alternative direction multiplier ADMM method, and further rejecting gross error and low-precision noise points based on the solved model. Fig. 5 shows an example of the experimental results of the multi-mode matching method of the present invention, in which fig. 5(a) is an optical image and an optical image, fig. 5(b) is an optical image and an infrared image, fig. 5(c) is an optical image and an SAR image, fig. 5(d) is an optical image and a point cloud depth map, fig. 5(e) is an optical image and a map, and fig. 5(f) is an optical image and a noctilucent image.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A multi-modal image feature matching method with scale and rotation invariance is characterized by comprising the following steps:

1) constructing a nonlinear rapid explicit diffusion scale space, and performing multi-scale transformation on a reference image and an image to be matched to respectively obtain images under each scale;

5) constructing a feature description vector of the feature points based on the maximum index map;

6) and constructing a feature point matching relation according to the feature description vector, and performing gross error elimination.

2. The method for multi-modal image feature matching with scale and rotation invariance of claim 1, wherein the step 1) is implemented by constructing a non-linear fast explicit diffusion scale space as follows:

σ_i(m,s)＝σ₀2^m+s/S,m∈[0,…,M-1],s∈[0,…,S-1],i∈[0,…,K-1](1) in the formula, σ₀As the initial scale, K is the number of scale space images.

Lⁱ⁺¹＝(I+τA(Lⁱ))Lⁱ (2)

3. The multi-modal image feature matching method with scale and rotation invariance as claimed in claim 1, wherein the image feature point extraction in step 2) is performed according to the maximum moment and minimum moment components of phase congruency, specifically as follows:

4. The method according to claim 1, wherein the maximum value index map is constructed in the step 3), specifically as follows:

3.1) carrying out Log-Gabor wavelet function convolution on the image;

Wherein, O is the number of wavelet directions, the upper mark omega is 1,2,.., O represents the corresponding channel;

3.4) for each pixel position in the image map, obtaining the pixel value corresponding to each channel of the log-Gabor convolution sequence to obtain an O-dimensional ordered array, and solving the maximum value of the O-dimensional ordered array and the omega of the channel where the maximum value is located_maxWill be ω_maxIs used as the maximum value index map constructed by the index values of the same pixel position.

5. The method according to claim 1, wherein the feature description vector of the feature points is constructed based on the maximum value index map in step 5), specifically as follows:

4.2) assigning a weight to each pixel;

6. The multi-modal image feature matching method with scale and rotation invariance as claimed in claim 5, wherein in the step 5), the feature point local image block is selected, and the image block size is 96 x 96 pixels; weighting each pixel by adopting a Gaussian function, wherein the standard deviation of the Gaussian function is 48; and dividing the sub-region into 6 multiplied by 6 sub-regions, counting the histogram distribution characteristics in each sub-region, wherein the number of histogram columns is 6.

7. The multi-modal image feature matching method with scale and rotation invariance of claim 1, wherein a Log-Gabor circular convolution sequence is constructed in the step 4), and the rotation invariance of feature matching is realized, specifically as follows:

respectively constructing a reference image and an image to be matchedA convolution sequence A and a convolution sequence B, and acquiring a maximum value index Map of the convolution sequence A of the reference image^ASequentially transforming the initial layers of the convolution sequences to obtain O convolution sequences with different initial layers for the convolution sequence B of the image to be matched; then, calculating the maximum index map of each convolution sequence to obtain a maximum index map set

Map of medium and maximum index^AMatching maximum index Map^B。

8. The multi-modal image feature matching method with scale and rotation invariance as claimed in claim 1, wherein in the step 6), the feature point matching relationship is constructed, and the coarse difference elimination is performed by establishing the feature point matching relationship by using a nearest neighbor search method and realizing the coarse difference and noise elimination by using a RANSAC method and a q norm method.

9. The multi-modal image feature matching method with scale and rotation invariance as claimed in claim 1, wherein in the step 6), a feature point matching relationship is constructed and coarse difference elimination is performed, specifically as follows: