CN115205558A

CN115205558A - Multi-mode image matching method and device with rotation and scale invariance

Info

Publication number: CN115205558A
Application number: CN202210980008.3A
Authority: CN
Inventors: 樊仲藜; 刘玉轩; 孙钰珊; 张力
Original assignee: Chinese Academy of Surveying and Mapping
Current assignee: Chinese Academy of Surveying and Mapping
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-10-18
Anticipated expiration: 2042-08-16
Also published as: CN115205558B

Abstract

The invention discloses a multi-mode image matching method and device with rotation and scale invariance, relates to the technical field of image processing, and mainly aims to improve the multi-mode image matching precision, improve the resistance to noise and the universality to different modal images, and eliminate the rotation, scale and translation differences among the images. The method comprises the following steps: the method comprises the steps of constructing a multi-modal image scale pyramid, extracting feature points of each image in the multi-modal image scale pyramid, constructing at least two image level feature graphs, constructing a feature descriptor with rotation invariance, resolving a geometric transformation model between a reference image and a target image, resampling the target image, constructing a convolution feature graph, completing enhancement matching and inverse computation of matching points, and outputting an enhancement matching result as a final result. The method is suitable for matching the multi-mode images.

Description

Multi-mode image matching method and device with rotation and scale invariance

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a feature decomposition method and system for multi-mode image block matching.

Background

In various serious disaster emergencies in recent years, by virtue of the advantages of flexibility, variety and the like, photogrammetry and remote sensing technology can obtain high-resolution images and topographic maps of disaster areas at the first time, and powerful support is provided for various aspects such as rescue and relief, facility construction, city planning and the like. Among its many applications, image matching is a core fundamental process, which is a prerequisite and guarantee for the implementation of these applications. The related applications based on matching include not only aerial triangulation in the photogrammetry field, but also visual navigation in the positioning navigation field, path planning in the robot field, target tracking in the intelligent transportation industry, and the like. The development of image matching problems can effectively promote the research progress of important problems in the related fields.

The current multi-modal image matching method mainly comprises the following steps: feature-based methods, region-based methods, and deep learning-based methods. The feature-based method is to extract significant features from a reference image and a target image simultaneously, generate description vectors for the features based on a certain strategy, and then complete image matching. The region-based method extracts a local window image from a reference image based on gray information or other feature metrics, slides the local window image in a search region on a target image as a fixed template, and calculates a score of each pixel in the search region based on a certain similarity measure, thereby completing image matching. The deep learning-based method is characterized in that a deep neural network is designed, so that information such as remarkable features, rotation angles, scale differences and the like can be automatically learned from images, multi-modal images can be further converted into homomorphic feature maps, and image matching is completed in a feature-based or region-based mode.

The current multi-mode matching method based on the characteristics has the problems that a proper characteristic point main direction cannot be given, the resistance to a rotation angle is poor, the scale invariance is difficult to realize, and the matching precision is relatively low.

Disclosure of Invention

In view of the above, the present invention provides a multi-modal image matching method and apparatus with rotation and scale invariance, and the main objective of the present invention is to improve the accuracy of multi-modal image matching, improve the resistance to noise and the versatility of different modality images, and eliminate the differences in rotation, scale and translation between images. Thereby improving the matching precision of the multi-modal images.

According to a first aspect of the present invention, there is provided a multi-modal image matching method with rotation and scale invariance, comprising:

s1: constructing a multi-mode image scale pyramid, and sampling the reference image and the target image step by step according to scaling comparison to obtain images with different scales to form the multi-mode image scale pyramid;

s2: respectively detecting and extracting feature points of each image in the multi-modal image scale pyramid from the reference image and the target image;

s3: performing function convolution on the reference image and the target image, constructing a hierarchical feature map based on the convolution image result of the feature points detected in S2, performing feature description based on the hierarchical feature map, and constructing at least two image hierarchical feature maps;

s4: calculating mode feature value centralization coordinates in the at least two image level feature maps, taking a connecting line direction of the feature points and the mode feature value centralization coordinates as a main direction of the feature points, rotating the local image to the main direction, and regularizing the feature values to construct a feature descriptor with rotation invariance;

s5: after feature extraction and feature description are completed, using the nearest Euclidean distance to obtain matching point correspondences, performing gross error elimination according to a similarity transformation model to obtain a matching result, and resolving a geometric transformation model between the reference image and the target image based on the matching result;

s6: resampling the target image to a pixel coordinate system of the reference image based on the geometric transformation model to obtain a resampled target image;

s7: reserving the characteristic points and the convolution result of the reference image, convolving the resampled target image to obtain a convolution result, and normalizing the convolution result of the reference image and the resampled target image along a direction dimension to construct a convolution characteristic diagram;

s8: and completing enhancement matching and matching point back calculation on the convolution characteristic graph, and outputting the enhancement matching result as a final result.

In a possible implementation manner, the method for calculating S2 a mode feature value centering coordinate in the at least two image level feature maps specifically includes:

determining a circular window by taking the feature point as the center in the at least two image level feature maps, and counting the number of each feature value in the window to obtain a feature value corresponding to a mode;

and regarding the corresponding area of the characteristic value corresponding to the mode as an irregular polygon with uniform thickness and density, calculating the centralized coordinates of each sub-area, adding the sub-areas pixel by pixel and continuously updating the centralized coordinates to obtain the centralized coordinates of the corresponding area of the characteristic value corresponding to the mode, namely the centralized coordinates of the mode characteristic value.

In a possible implementation, the method for calculating the mode feature value centering coordinates in the at least two image level feature maps in S2 further includes:

and analyzing the secondary mode characteristic value, if the secondary mode characteristic area exceeds the first percentage of the mode characteristic area, adding and calculating the centralized coordinate of the secondary mode characteristic area, and determining a secondary main direction and a corresponding characteristic vector for the characteristic point.

In one possible embodiment, the constructing at least two image level feature maps by performing feature description based on the level feature map specifically includes:

and setting 2K directions when the reference image and the target image are convolved, and respectively extracting K odd layers to construct a first feature map and K even layers to construct a second feature map to finish feature description.

In a possible embodiment, the enhancing matching and the inverse calculation of the matching point specifically include:

and calculating matching points corresponding to each feature point on the reference image for the convolution feature map, performing gross error elimination according to the similarity transformation model to obtain a matching result, performing inverse computation on the matching points on the resampled target image to the original target image by using the geometric transformation model S5 to complete enhancement matching, and outputting the enhancement matching result as a final result.

According to a second aspect of the present invention, there is provided a multi-modal image matching apparatus with rotation and scale invariance, comprising:

a sampling unit: constructing a multi-mode image scale pyramid, and sampling the reference image and the target image step by step according to scaling comparison to obtain images with different scales;

a detection unit: respectively detecting and extracting feature points of each image in the multi-mode image scale pyramid from the reference image and the target image;

a description unit: performing function convolution on the reference image and the target image, constructing a hierarchical feature map based on the convolution image result of the feature points detected in S2, performing feature description based on the hierarchical feature map, and constructing at least two image hierarchical feature maps;

a calculation unit: s2, calculating mode feature value centralization coordinates in the at least two image level feature maps, taking a connecting line direction of feature points and the mode feature value centralization coordinates as a main direction of the feature points, rotating the local images to the main direction, and regularizing the feature values to construct a feature descriptor with rotation invariance;

a resolving unit: after the feature extraction and feature description are completed in S2, obtaining a corresponding matching point by using the nearest Euclidean distance, carrying out gross error elimination according to a similar transformation model to obtain a matching result, and calculating a geometric transformation model between the reference image and the target image based on the matching result;

a resampling unit: resampling the target image to a pixel coordinate system of the reference image based on the geometric transformation model to obtain a resampled target image;

a construction unit: reserving the characteristic points and the convolution result of the reference image, performing convolution on the resampled target image to obtain a convolution result, and normalizing the convolution result of the reference image and the resampled target image along a direction dimension to construct a convolution characteristic diagram;

an enhanced matching unit: and completing enhancement matching and matching point back calculation on the convolution characteristic graph, and outputting the enhancement matching result as a final result.

In a possible embodiment, the detection unit comprises:

a statistic module: determining a circular window by taking the feature point as the center in the at least two image level feature maps, and counting the number of each feature value in the window to obtain a feature value corresponding to a mode;

a main direction module: regarding the corresponding area of the feature value corresponding to the mode as an irregular polygon with uniform thickness and density, calculating the centralized coordinates of each sub-area, adding the sub-areas pixel by pixel and continuously updating the centralized coordinates to obtain the centralized coordinates of the corresponding area of the feature value corresponding to the mode, namely the centralized coordinates of the mode feature value;

a determination module: and taking the connecting line direction of the feature point and the mode feature value centralized coordinate as the main direction of the feature point.

In a possible implementation, the detection unit further comprises:

an analysis module: and analyzing the secondary mode characteristic value, if the secondary mode characteristic area exceeds the first percentage of the mode characteristic area, adding and calculating the centralized coordinate of the secondary mode characteristic area, and determining a secondary main direction and a corresponding characteristic vector for the characteristic point.

According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method described above when executing the computer program.

According to a fourth aspect of the invention, a computer-readable storage medium is provided, having stored a computer program which, when executed by a processor, carries out the steps of the above-mentioned method.

Compared with the existing multi-mode matching method based on characteristics, namely, the method extracts significant characteristics from a reference image and a target image simultaneously, generates description vectors for the characteristics based on a certain strategy and further completes image matching, the method fully utilizes the significant characteristics in the multi-mode image by integrating an image characteristic model and a related analysis model, and improves the resistance to noise and the universality to different-mode images; in the matching process, initial matching is carried out in a characteristic matching mode, the rotation angle, the scale difference and the translation amount among images are calculated, and meanwhile, the matching result is ensured to have higher precision; and an image resampling step is utilized, and an area matching mode is adopted to finish the enhanced matching, so that the matching precision is further improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flowchart illustrating a multi-modal image matching method with rotation and scale invariance according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a multi-modal image matching apparatus with rotation and scale invariance according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a multi-modal image matching apparatus with rotation and scale invariance according to an embodiment of the present invention;

fig. 4 is a schematic solid structure diagram of a multi-modal image matching method with rotation and scale invariance according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As described in the background art, the current feature-based multi-modal matching method has the problems that a proper feature point main direction cannot be given, the resistance to a rotation angle is poor, scale invariance is difficult to realize, and the matching accuracy is relatively low.

In order to solve the above technical problem, an embodiment of the present invention provides a multi-modal image matching method with rotation and scale invariance, a flow chart is shown in fig. 1, and the method includes:

s1: constructing a multi-mode image scale pyramid, and sampling the reference image and the target image step by step according to scaling comparison to obtain images with different scales;

in some optional implementations of some embodiments, a multi-layer pyramid structure is generated for the reference image and the target image by down-sampling step by step, assuming that the images are images

Down-sampled by a scaling ratio of 1.5

I.e. reduced to 2/3 of the original image, followed by

And

continuously performing three times of downsampling according to the scaling ratio of 2 to respectively obtain

、

、

And

、

、

the length and width of the three-time down-sampling result are respectively reduced to 1/2, 1/4, 1/8 and 1/3, 1/6 and 1/12 of the original image, and the algorithm can resist the condition that 12-time scale difference exists between the images at most through the current scale pyramid.

in some alternative implementations of some embodiments, a phase consistency model is used as a basis for feature detection. For example, a multi-scale, multi-directional log-Gabor filter-based phase consistency model

Can be expressed as follows:

in the formula (I), the compound is shown in the specification,

calculating the image phase consistency value; s and o respectively represent the number of scales and directions of the filter;

is a frequency expansion weight factor;

representing the result of the convolution of the image by the filter;

is a phase deviation function; t is a noise energy threshold; symbol

Represents that the value is taken when the value is a positive number, and is taken as 0 when the value is a negative number;

is a fraction that avoids a denominator of 0.

In order to realize accurate feature point positioning, combining the directional phase consistency result with a moment analysis equation to obtain a maximum moment value feature map M and a minimum moment value feature map M:

wherein A, B, C is three intermediate quantities,

in the formula (I), the compound is shown in the specification,

to represent

A phase consistency map in the direction of the phase,

is the azimuth angle of the o-th filter.

The maximum moment feature map M is a feature measure of image edge information, the minimum moment feature map M is a feature measure of image corner information, and M is a proper subset of M. And performing simple linear weighting calculation on the M and the M to obtain a better image edge feature map, wherein the edge feature map is irrelevant to image illumination and contrast and is suitable for feature detection of multi-mode images. Therefore, a Phase consistency moment Difference feature map (DGPC) is constructed as a feature metric reflecting image edge information, and a FAST feature detector is used to implement robust feature detection on the Phase consistency moment Difference feature map. The calculation mode of the phase consistency moment difference characteristic diagram is expressed as follows:

in the formula, DGPC represents the calculated phase consistency moment difference characteristic diagram result,

represents the weight coefficient and has the value range of (0,1).

In some alternative implementations of some embodiments, mean filtering is used as the basis for feature detection. For example, a mean filtering based feature map generation model may be expressed as follows:

in the formula (I), the compound is shown in the specification,

the results of the generated feature maps are shown,

a filter window of diameter s is shown.

The feature map can also be used as a feature metric for reflecting image edge information, and a FAST feature detector is used for realizing robust feature detection on the feature map

S3: performing convolution calculation on the reference image and the target image, for example, selecting filters with directional properties such as log-Gabor, steerable, ROEWA, sobel and the like, constructing a hierarchical feature map based on the convolution image result of the feature points detected in S2, performing feature description based on the hierarchical feature map, and constructing at least two image hierarchical feature maps;

in some alternative implementations of some embodiments, a hierarchical feature map is constructed based on the obtained convolution image results, and feature description is performed based on the feature map. Specifically, convolution images of n scales and k directions are combined

Carrying out scale summation to obtain convolution images in k directions

：

Convolving images of k directions

Arranging an image cube according to the sequence of convolution directions

The convolved images in each direction are placed at a determined level in the cube and correspond to a unique layer number (an integer of 1~k). Then, a hierarchical feature map (LG) is generated by acquiring the direction Layer number of the maximum convolution value at each pixel plane position (x, y):

in the formula (I), the compound is shown in the specification,

is an image cube, and the image cube is a color image,

is a hierarchical feature map result;

is a function for locating the maximum convolution value over the target pixel (x, y);

is a function for determining the layer number where the maximum convolution value is located.

S4: calculating mode feature value centralization coordinates in the at least two image hierarchy feature maps, taking a connecting line direction of feature points and the mode feature value centralization coordinates as a main direction of the feature points, rotating the local image to the main direction, and regularizing the feature values to construct a feature descriptor with rotation invariance;

in some optional implementations of some embodiments, a principal direction is determined for each feature point by computing mode feature value centering coordinates, and computing a feature description vector. The log-Gabor convolution, steerable convolution or ROEWA convolution are all convolution operations in a fixed direction, so that the layer number corresponding to the maximum convolution value when the image rotates is uncertain, and the hierarchical feature map does not have rotation invariance. The mode feature values are centered by calculating area centers, and the optional area centers comprise a center of gravity, a center of mass and the like.

Determining a circular window with radius r by taking the characteristic point as the center, and counting the number of k characteristic values (an integer of 1~k) in the window to obtain a characteristic value corresponding to the mode

(ii) a Then, will

Corresponding characteristic region

It is considered as an irregular polygon with uniform thickness and density, and therefore, its mass is proportional to the area.

For the calculation of the barycentric coordinates, it is assumed that there are subregions

And

respectively corresponding to areas of

And

they can be considered as two mass points with their centers of gravity at a and b, respectively, and given the overall center of gravity coordinate as G, G must be on the ab line and satisfy the following equality:

in the formula (I), the compound is shown in the specification,

and

a vector representing a line connecting the centers of gravity,

the formula for calculating the center of gravity G can be further obtained by taking the modulus of the vector:

according to the formula, the gravity center coordinate of the whole mode feature area can be finally obtained by adding sub-areas pixel by pixel and continuously updating the gravity center coordinate;

for the calculation of the coordinates of the centroid, the center of a circular window is taken as an initial centroid, the coordinates of the initial centroid are (Cx, cy), the coordinates (px, py) of all pixels in the area corresponding to the mode characteristic value are obtained, the distance weighting calculation is carried out pixel by pixel according to the distance between the pixel and the initial centroid as the weight, the coordinates (Cx, cy) of the centroid are continuously updated,

obtaining a centroid coordinate of a corresponding area of the mode characteristic value, namely a mode characteristic value centroid coordinate;

the connecting line direction of the feature point and the mode feature value centralized coordinate is used as the main direction of the feature point, the local image is rotated to the main direction, the feature values in the window are regularized, the mode feature value is defined as k, the rest feature values are updated according to the magnitude sequence of the convolution direction, and the side length is 2

The square window is divided into m sub-areas, histogram statistics is carried out on each sub-area, statistical results are sequentially connected and normalized, and the final characteristic point is obtained

A dimensional feature vector.

Furthermore, considering that the radiation difference between images may cause local change of the hierarchical characteristic diagram, the method aims at the sub-mode characteristic value

Analyzed, if anyMode feature region

Region of excess mode features

The first ratio of (a), the value of the first ratio preferred in the experiment is 80%, the calculation is increased

The secondary main direction and the corresponding feature vector are determined for the target feature point by the aid of the centralized coordinates, and algorithm robustness can be improved by the aid of the strategy.

Filter direction number selected in convolution calculation

And certain mutual exclusion relation exists between the expression effect of the hierarchical characteristic diagram and the expression effect of the hierarchical characteristic diagram. If k is too small, the similarity of the characteristic diagram under certain rotation angles is not enough, and the matching is failed; if k is too large, the feature maps are too cluttered to accurately estimate the main direction and the matching performance is reduced, the method solves the problem by using at least two hierarchical feature map strategies, specifically, j x k directions are set when the image is convolved, j x k Zhang Juanji maps can be obtained, j maps are divided into one group every interval, k groups of convolution maps are obtained, a first feature map is extracted from each group to construct a first feature map, a second feature map is extracted from each group to construct a second feature map, j feature maps are constructed by analogy, and feature description is completed according to the steps.

S5: after the feature extraction and feature description are completed in S2, obtaining a corresponding matching point by using the nearest Euclidean distance, carrying out gross error elimination according to a similar transformation model to obtain a matching result, and calculating a geometric transformation model between the reference image and the target image based on the matching result;

in some alternative implementations of some embodiments, the nearest Euclidean distance is used to obtainMatching point correspondences, performing gross error elimination according to a similarity transformation model by using RANSAC or other methods to obtain a matching result, and calculating a geometric transformation model between a reference image and a target image based on the matching result

It is passed to the next step.

s7: reserving the feature points and the convolution result of the reference image, convolving the resampled target image by using the filter selected in the S3 again to obtain a convolution result, and normalizing the convolution result of the reference image and the resampled target image along a direction dimension to construct a convolution feature map;

s8: and finishing enhancement matching and matching point back calculation on the convolution feature map, and outputting an enhancement matching result serving as a final result.

In some optional implementation manners of some embodiments, the target image is resampled to the pixel coordinate system of the reference image based on the geometric transformation model H solved by the initial matching, the rotation, scale and translation differences between the resampled target image and the reference image are almost completely eliminated, and the region feature construction is performed first to enhance the matching by using a region matching method to further improve the matching accuracy.

Further, the multi-direction convolution result is used for constructing regional characteristics and reference images of original scales

The image cube after the convolution operation is carried out in the initial matching step and the characteristic points and the scales of the image cube are summed

Reserving and resampling the target image

Performing convolution to obtain an image cube with the summed scales

. Then, normalizing the two image cubes along the vertical direction of the image plane to complete the construction of the region characteristics, wherein the normalization process is expressed by the following formula:

in the formula (I), the compound is shown in the specification,

is a decimal number which avoids the denominator being zero,

is the number of convolution directions.

Since the target image has been resampled, the target image is re-sampled

The characteristic point of (1) is

Have the same coordinates, and the matching of the regional features is performed using a denormalized three-dimensional phase correlation measure, which is expressed by:

in the formula (I), the compound is shown in the specification,

is a cross-power spectrum of the image,

and

respectively represent

And

the result of the three-dimensional fast fourier transform of (a),

indicating that the peak appears

The correlation function of (a) is determined,

representing a three-dimensional inverse fast fourier transform, representing the computation of the complex conjugate,

representing a three-dimensional unit vector.

By searching for correlation functions

And obtaining a matching point corresponding to each feature point on the reference image at the position where the peak appears, and then performing gross error elimination by using RANSAC or other methods according to a similarity transformation model to obtain a matching result. Finally, will

And (4) performing inverse calculation on the matching points to the original target image by using a geometric transformation model H to complete enhancement matching, and outputting the enhancement matching result as a final result.

Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a multi-modal image matching method apparatus with rotation and scale invariance, as shown in fig. 2, the apparatus includes: the device comprises a sampling unit, a detection unit, a description unit, a calculation unit, a resampling unit, a construction unit and an enhancement matching unit.

The sampling unit can be used for constructing a multi-mode image scale pyramid, and sampling the reference image and the target image step by step according to scaling comparison to obtain images with different scales;

the detection unit may be configured to detect and extract feature points of each image in the multi-modal image scale pyramid for the reference image and the target image respectively;

the description unit may be configured to perform function convolution on the reference image and the target image, construct a hierarchical feature map based on a convolution image result of the feature points detected in S2, perform feature description based on the hierarchical feature map, and construct at least two image hierarchical feature maps; the calculating unit may be configured to calculate mode feature value centering coordinates in the at least two image level feature maps in S2, use a connection line direction of the feature point and the mode feature value centering coordinates as a main direction of the feature point, rotate the local image to the main direction, and regularize the feature value to construct a feature descriptor with rotation invariance;

the calculating unit may be configured to obtain a matching point corresponding to the reference image by using the nearest euclidean distance after completing the feature extraction and the feature description in S2, perform coarse subtraction according to a similarity transformation model to obtain a matching result, and calculate a geometric transformation model between the reference image and the target image based on the matching result;

the resampling unit may be configured to resample the target image to the pixel coordinate system of the reference image based on the geometric transformation model to obtain a resampled target image;

the constructing unit may be configured to retain the feature points of the reference image and the convolution result, convolve the resampled target image to obtain a convolution result, normalize the convolution results of the reference image and the resampled target image along a direction dimension, and construct a convolution feature map;

the enhancement matching unit can be used for completing enhancement matching and matching point back calculation on the convolution feature map, and outputting the enhancement matching result as a final result.

For the embodiment of the present invention, as shown in fig. 3, the detection unit further includes a statistics module, a main direction module, a determination module, and an analysis module.

The statistical module may be configured to determine a circular window in the at least two image level feature maps by using the feature point as a center, and perform statistics on the number of each feature value in the window to obtain a feature value corresponding to a mode;

the principal direction module may be configured to regard a region corresponding to the feature value corresponding to the mode as an irregular polygon having uniform thickness and density, calculate a centering coordinate of each sub-region, add sub-regions pixel by pixel, and continuously update the centering coordinate, to obtain a centering coordinate of the region corresponding to the feature value corresponding to the mode, that is, a mode feature value centering coordinate;

the determining module may be configured to use a connection line direction of the feature point and the mode feature value centralized coordinate as a main direction of the feature point;

the analysis module may be configured to analyze the secondary mode feature value, and if the secondary mode feature area exceeds a first percentage of the mode feature area, add a centering coordinate of the secondary mode feature area to be calculated, and determine a secondary principal direction and a feature vector corresponding to the secondary principal direction for the feature point.

It should be noted that other corresponding descriptions of the multi-modal image matching method and apparatus with rotation and scale invariance provided in the embodiment of the present invention may refer to the corresponding description of the method shown in fig. 1, and are not repeated herein.

Based on the method shown in fig. 1, correspondingly, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:

s4: calculating mode feature value centralization coordinates in the at least two image hierarchy feature maps in S2, taking a connecting line direction of feature points and the mode feature value centralization coordinates as a main direction of the feature points, rotating the local image to the main direction, regularizing the feature values, and constructing a feature descriptor with rotation invariance;

s5: after the feature extraction and the feature description of S2 are finished, a matching point is obtained by using the nearest Euclidean distance, gross error elimination is carried out according to a similar transformation model to obtain a matching result, and a geometric transformation model between the reference image and the target image is calculated based on the matching result;

s7: reserving the characteristic points and the convolution result of the reference image, performing convolution on the resampled target image to obtain a convolution result, and normalizing the convolution result of the reference image and the resampled target image along a direction dimension to construct a convolution characteristic diagram;

Based on the foregoing embodiments of the method shown in fig. 1 and the multi-modal image matching method apparatus with rotation and scale invariance shown in fig. 2 and fig. 3, an embodiment of the present invention further provides an entity structure diagram of the multi-modal image matching method apparatus with rotation and scale invariance, as shown in fig. 4, the apparatus includes: a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the memory and the processor are both configured on a bus such that the processor, when executing the program, performs the steps of:

s4: s2, calculating mode feature value centralization coordinates in the at least two image level feature maps, taking a connecting line direction of feature points and the mode feature value centralization coordinates as a main direction of the feature points, rotating the local images to the main direction, and regularizing the feature values to construct a feature descriptor with rotation invariance;

The device also includes: a bus configured to couple the processor and the memory.

Through the technical scheme of the invention, the method can comprise the following steps:

s6: based on the geometric transformation model, resampling the target image to a pixel coordinate system of the reference image to obtain a resampled target image;

The accuracy of multi-modal image matching can be improved, the resistance to noise and the universality to different modal images can be improved, and the rotation, scale and translation differences among the images can be eliminated. Thereby improving the matching precision of the multi-modal images.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are used to distinguish the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the multi-modal image matching in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A multi-modal image matching method with rotation and scale invariance is characterized by comprising the following steps:

s2: respectively detecting and extracting feature points of each image in the multi-mode image scale pyramid from the reference image and the target image;

s3: convolving the reference image and the target image by using a filter, constructing a hierarchical feature map based on the convolution image result of the feature points detected in S2, performing feature description based on the hierarchical feature map, and constructing at least two image hierarchical feature maps;

s7: reserving the feature points and the convolution result of the reference image, performing convolution on the resampled target image by using the selected filter in the S3 again to obtain a convolution result, and normalizing the convolution result of the reference image and the resampled target image along a direction dimension to construct a convolution feature map;

2. The method according to claim 1, wherein the method for determining the main direction specifically comprises:

selecting a center of a calculation area, and centralizing the mode characteristic value;

and taking the connecting line direction of the feature point and the mode feature value centralized coordinate as the main direction of the feature point.

3. The method of claim 2, wherein the determining of the primary direction further comprises:

4. The method according to claim 1, wherein the performing feature description based on the hierarchical feature map to construct at least two image hierarchical feature maps specifically comprises:

setting j x k directions when convolving the reference image and the target image by using a filter with multidirectional properties, obtaining j x k Zhang Juanji graphs, dividing j images into one group every interval to obtain k groups of convolution graphs, extracting a first convolution graph in each group to construct a first feature graph, extracting a second convolution graph in each group to construct a second feature graph, and so on to construct j feature graphs, thereby completing feature description.

5. The method of claim 1, wherein the enhancing the matching and the back-calculation of the matching points comprise:

6. A multi-modality image matching apparatus with rotation and scale invariance, comprising:

a detection unit: respectively detecting and extracting feature points of each image in the multi-modal image scale pyramid from the reference image and the target image;

a description unit: performing function convolution on the reference image and the target image, constructing a hierarchical feature map based on convolution image results of the feature points detected in S2, performing feature description based on the hierarchical feature map, and constructing at least two image hierarchical feature maps;

a calculation unit: calculating mode feature value centralization coordinates in the at least two image hierarchy feature maps in S2, taking a connecting line direction of feature points and the mode feature value centralization coordinates as a main direction of the feature points, rotating the local image to the main direction, regularizing the feature values, and constructing a feature descriptor with rotation invariance;

a construction unit: reserving the characteristic points and the convolution result of the reference image, convolving the resampled target image to obtain a convolution result, and normalizing the convolution result of the reference image and the resampled target image along a direction dimension to construct a convolution characteristic diagram;

7. The apparatus of claim 6, wherein the detection unit comprises:

8. The apparatus of claim 7, wherein the detection unit further comprises:

an analysis module: and analyzing the secondary mode characteristic value, if the secondary mode characteristic area exceeds a first percentage of the mode characteristic area, increasing the centralized coordinates of the secondary mode characteristic area, and determining a secondary main direction and a characteristic vector corresponding to the secondary main direction for the characteristic point.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.