CN117611852A

CN117611852A - Image matching method, device, equipment and computer storage medium

Info

Publication number: CN117611852A
Application number: CN202311637140.5A
Authority: CN
Inventors: 祁晓婷; 樊治国
Original assignee: Qingdao Gaozhong Information Technology Co ltd
Current assignee: Qingdao Gaozhong Information Technology Co ltd
Priority date: 2023-12-01
Filing date: 2023-12-01
Publication date: 2024-02-27

Abstract

The embodiment of the invention relates to the technical field of computer data processing, and discloses an image matching method, an image matching device, image matching equipment and a computer storage medium, wherein the method comprises the following steps: respectively extracting characteristic images of each image to be matched under at least three preset scales to obtain characteristic images of each image to be matched under each scale; performing image matching on the feature images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two feature images with the maximum scale is obtained; the image matching includes: fusing a characteristic descriptor based on a attention mechanism of a characteristic image of the current scale in a target area into another image to be matched; and screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two characteristic images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched. By the mode, the embodiment of the invention improves the accuracy of image matching.

Description

Image matching method, device, equipment and computer storage medium

Technical Field

The embodiment of the invention relates to the technical field of computer data processing, in particular to an image matching method, an image matching device, image matching equipment and a computer storage medium.

Background

Feature matching of images is one of the core technologies in the fields of computer vision and digital image processing. It refers to a process of extracting feature points from an image and matching the feature points with feature points in another image to determine a relationship or relative motion between the two images. Feature matching is commonly used for applications such as image stitching, object recognition, three-dimensional reconstruction, and motion estimation. Feature matching plays a significant role in the field of modern computer vision and is a bridge connecting low-level image processing and high-level vision applications.

In carrying out the prior art, the applicant found that: the existing image feature matching generally only performs feature matching on two images to be matched on a single granularity, and the problems of the existing image feature matching are as follows: the view angle of single granularity is limited, and the details of the images to be matched are lost more, so that the accuracy of the existing image matching is lower.

Disclosure of Invention

In view of the above problems, embodiments of the present invention provide an image matching method, apparatus, device, and computer storage medium, which are used to solve the problem in the prior art that the accuracy of image matching is low.

According to an aspect of an embodiment of the present invention, there is provided an image matching method, including:

acquiring two images to be matched;

extracting characteristic images of each image to be matched under at least three preset scales respectively to obtain characteristic images of each image to be matched under each scale;

performing image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale;

And screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched.

In an alternative, the method further comprises:

for a first characteristic image of a current scale, weighting a second characteristic image of the current scale according to the matching confidence coefficient matrix between two characteristic images of a previous scale to obtain a cross attention characteristic corresponding to the first characteristic image; the first characteristic image is any one of the two characteristic images, and the second characteristic image is one of the two characteristic images except the first characteristic image;

and carrying out feature fusion on the cross attention feature and the first feature image to obtain the fused feature descriptor corresponding to the first feature image.

In an alternative, the method further comprises:

for a first feature image of a previous scale, determining pixel points, of which the matching confidence coefficient is greater than a preset confidence coefficient threshold value, in the first feature image and the second feature image of the previous scale as initial matching pixel points in the first feature image of the previous scale;

Collaborative filtering is carried out on the initial matching pixel points in the first characteristic image and the second characteristic image of the previous scale according to a mutual neighbor algorithm, so that target matching pixel points between the two characteristic images of the previous scale are obtained;

and mapping the target matching pixel points into the two characteristic images under the current scale to obtain the target region.

In an alternative, the method further comprises:

inputting the images to be matched into a preset pyramid convolution network to obtain the characteristic images of each scale; the pyramid convolution network comprises a plurality of longitudinally connected first convolution layers, wherein each first convolution layer is transversely connected with a target number of second convolution layers respectively; one of the first convolution layers corresponds to one of the scales; the target number is determined according to the corresponding scale of the first convolution layer; and after the image to be matched passes through the first convolution layer and the second convolution layer, obtaining the characteristic image under the scale corresponding to the first convolution layer.

In an alternative, the method further comprises:

and according to the ascending order of the scales, the second convolution layer at the last end corresponding to each current scale is fused with the output of the last second convolution layer at the last scale and then is output to the last second convolution layer corresponding to the current scale, wherein the second convolution layer at the last end is a 1 multiplied by 1 convolution layer.

In an alternative, the method further comprises:

and the output of the last second convolution layer of the previous scale is subjected to up-sampling and then fused with the second convolution layer of the last second corresponding to the current scale, so that the input of the last second convolution layer corresponding to the current scale is obtained, wherein the last second convolution layer is a 3 multiplied by 3 convolution layer.

In an alternative, the method further comprises:

respectively inputting the two characteristic images of each scale into a preset transducer model to obtain the characteristic descriptors of the characteristic images of each scale; the transducer model comprises a first normalization layer, a multi-head attention layer, a second normalization layer and a multi-layer perceptron which are sequentially connected; the characteristic image is input into the second normalization layer after being fused with the characteristic image through the output of the first normalization layer and the multi-head attention layer, and the characteristic image is used as the output of the transducer model after being fused with the output of the multi-layer sensor through the output of the first normalization layer and the multi-head attention layer.

According to another aspect of an embodiment of the present invention, there is provided an image matching apparatus including:

The acquisition module is used for acquiring two images to be matched;

the extraction module is used for respectively extracting the characteristic images of each image to be matched under at least three preset scales to obtain the characteristic images of each image to be matched under each scale;

the matching module is used for carrying out image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale;

And the screening module is used for screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched.

the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the image matching method embodiment as described in any one of the preceding claims.

According to yet another aspect of an embodiment of the present invention, there is provided a computer-readable storage medium having stored therein at least one executable instruction for causing an image matching apparatus to perform the operations of the image matching method embodiment of any one of the preceding claims.

According to the embodiment of the invention, two images to be matched are obtained; extracting characteristic images of each image to be matched under at least three preset scales respectively to obtain characteristic images of each image to be matched under each scale; performing image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale; and screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched. Therefore, the method is different from the problem that in the prior art, image matching is only carried out on a single scale, excessive image details are lost, and the accuracy of image matching is low.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present invention can be more clearly understood, and the following specific embodiments of the present invention are given for clarity and understanding.

Drawings

The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 shows a flow chart of an image matching method according to an embodiment of the present invention;

fig. 2 shows a schematic structural diagram of a pyramid convolution network in an image matching method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a transducer model in the image matching method according to the embodiment of the present invention;

fig. 4 is a schematic flow chart of an image matching method according to still another embodiment of the present invention;

fig. 5 shows a schematic structural diagram of an image matching apparatus according to an embodiment of the present invention;

fig. 6 shows a schematic structural diagram of an image matching apparatus provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.

Fig. 1 shows a flowchart of an image matching method provided by an embodiment of the present invention, which is performed by a computer processing device. The computer processing device may include a cell phone, a notebook computer, etc. As shown in fig. 1, the method comprises the steps of:

step 10: and acquiring two images to be matched.

The method comprises the steps of acquiring a plurality of images to be matched, and carrying out image matching processing on the images to be matched. The images to be matched can be images which need to be subjected to image stitching, object recognition, three-dimensional reconstruction, motion estimation and the like, and the relation or relative motion between two images to be matched is determined by matching the characteristic points in the images to be matched with the characteristic points in the other images.

Step 20: and respectively extracting characteristic images of each image to be matched under at least three preset scales to obtain characteristic images of each image to be matched under each scale.

The resolution of the single-scale image to be matched is considered to be limited, and the detail of the image can be provided to be limited, so that the accuracy of image matching according to the two single-scale images to be matched is low. Therefore, in the embodiment of the invention, the feature images of the images to be matched under a plurality of scales are respectively extracted, and feature reconstruction and matching are repeatedly performed on the resolution levels corresponding to the scales, wherein the feature matching result on the lower resolution scale is referred to when the feature reconstruction is performed, so that the matching result is deepened and refined step by step continuously, and the image matching result is more adaptive and deep.

Further, when determining the specific scale number, considering that if only the image features on the two scales of the coarsest granularity and the finest granularity are considered, the scale change gradient in the image matching process is large, more loss of image details on the intermediate scale between the coarsest granularity and the finest granularity can be caused, and the accuracy of image matching is affected. Therefore, in the embodiment of the invention, the characteristic images of each image to be matched under at least three preset scales are respectively extracted, so that the characteristics of the images on the multi-level resolution are covered with a flatter gradient, the details of the images to be matched are more reserved, and the accuracy of image matching is improved.

It will be appreciated that, in order to achieve a balance between the accuracy of image matching and the matching efficiency, feature images of each image to be matched under three scales may be taken, where the three scales may be 1/8, 1/4 and 1/2 of the original scale of the image to be matched, respectively.

Specifically, in one embodiment of the present invention, step 20 further comprises:

step 201: inputting the images to be matched into a preset pyramid convolution network to obtain the characteristic images of each scale; the pyramid convolution network comprises a plurality of longitudinally connected first convolution layers, wherein each first convolution layer is transversely connected with a target number of second convolution layers respectively; one of the first convolution layers corresponds to one of the scales; the target number is determined according to the corresponding scale of the first convolution layer; and after the image to be matched passes through the first convolution layer and the second convolution layer, obtaining the characteristic image under the scale corresponding to the first convolution layer.

The method comprises the steps of determining the corresponding target quantity according to the scale between the scale corresponding to the first convolution layer and the scale of the original image to be matched, wherein for example, when the scale corresponding to the first convolution layer is 1/8 of the scale of the original image to be matched, the first convolution layer is transversely connected with one second convolution layer, when the scale corresponding to the first convolution layer is 1/4 of the scale of the original image to be matched, the first convolution layer is transversely connected with two second convolution layers, and when the scale corresponding to the first convolution layer is 1/2 of the scale of the original image to be matched, the first convolution layer is transversely connected with three second convolution layers.

The pyramid convolution network is a standard convolution structure with a feature map pyramid network, and the structure of the pyramid convolution network can be shown in fig. 2, and a plurality of first convolution layers are longitudinally connected to realize progressive refinement of resolution of images to be matched. From the left side of the pyramid convolution network from the bottom up, each time the image to be matched passes through a convolution (the first convolution layer and the target number of second convolution layers transversely connected with the first convolution layer), the size of the characteristic image is halved, and therefore the characteristic image with multiple scales is generated.

Further, when the number of scales is three, the scale numbers can be respectively 1/8, 1/4 and 1/2 scale feature graphs of the original image to be matched, and the structure of the pyramid convolution network can refer to fig. 2. The first convolution layers corresponding to 1/8, 1/4 and 1/2 scale are Conv2_x, conv3_x and Conv3_x respectively. The second convolution layer of Conv2_x transverse connection is Conv2d (1×1, s 2), the second convolution layer of Conv3_x transverse connection corresponding to 1/4 scale is Conv2d (1×1, s 2) and Conv2d (3×3, s 2), and the second convolution layer of Conv3_x transverse connection corresponding to 1/2 scale is Conv2d (1×1, s 2) and Conv2d (3×3, s 2).

In still another embodiment of the present invention, in ascending order of the scales, the second last convolution layer corresponding to each current scale is fused with the output of the last second convolution layer of the previous scale, and then output to the last second convolution layer corresponding to the current scale, where the second last convolution layer is a 1×1 convolution layer.

As shown in fig. 2, the left side features are combined by 1x1 convolution, and the gradient buffering function is simultaneously performed on the backbone network, so that the network training is optimized.

In yet another embodiment of the present invention, the output of the last second convolution layer of the previous scale is up-sampled and then fused with the second last convolution layer corresponding to the current scale, so as to obtain the input of the last second convolution layer corresponding to the current scale, where the last second convolution layer is a 3×3 convolution layer.

The resolution of the low-resolution feature map is increased by upsampling (upsampling), and then tensor operation (element-wise) is performed on the low-resolution feature map and the adjacent high-level feature map, and element-wise addition/subtraction/multiplication/division operation is performed on two matrices/lists with identical dimensions or on the same set with other dimensions. Such fusion iterations may proceed until a final feature image is generated. And the final characteristic mapping is obtained through the convolution operation of 3x3 at the rightmost part of the pyramid convolution network, so that the aliasing effect possibly introduced by up-sampling can be effectively reduced. As shown in FIG. 2, the final output includes three feature maps of 1/2,1/4,1/8 scale with the original map, denoted as P2, P3, and P4, respectively.

Wherein step 201 further comprises:

step 2011: respectively inputting the two characteristic images of each scale into a preset transducer model to obtain the characteristic descriptors of the characteristic images of each scale; the transducer model comprises a first normalization layer, a multi-head attention layer, a second normalization layer and a multi-layer perceptron which are sequentially connected; the characteristic image is input into the second normalization layer after being fused with the characteristic image through the output of the first normalization layer and the multi-head attention layer, and the characteristic image is used as the output of the transducer model after being fused with the output of the multi-layer sensor through the output of the first normalization layer and the multi-head attention layer.

Wherein the multi-headed attention layer may be based on a self-attention mechanism. The Multi-Layer perceptron (MLP) comprises an input Layer, at least one hidden Layer and an output Layer which are sequentially connected, and different layers of the MLP are fully connected. The structure of the transducer model can be shown in fig. 3, and as shown in fig. 3, for each feature image of each scale, the feature image (Input) is Input into the transducer model to perform the following process: the first normalization layer (NORM 1) processes the feature image and then outputs the processed feature image to a Multi-Head input end and an Output end of a Multi-Head Attention layer (Multi-Head Attention), the Multi-Head Attention layer is fused with the Output of the first normalization layer based on a self-Attention mechanism and then outputs the processed feature image to the second normalization layer (NORM 2) and the Output end of the Multi-layer sensor, the second normalization layer is processed and then outputs the processed feature image to the Multi-layer sensor, and the Multi-layer sensor is fused with the Output of the Multi-Head Attention layer and then outputs the processed feature image to obtain a feature descriptor (Output) corresponding to the feature image. The feature descriptors corresponding to the feature images are obtained through a multi-head self-attention mechanism in the transducer, so that rough local features are converted into feature representations which are easy to match.

Step 30: performing image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; and determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale.

According to the ascending order of the scale, namely from the feature image with smaller scale (lower resolution) to the feature image with larger scale (higher resolution), the feature fusion and the feature matching are carried out step by step, so that the precision of the image matching is realized step by step. When each feature image of the current scale is matched, a result of matching the feature image of the previous scale is used as reference knowledge, specifically, pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale are mapped to the target area, of which the matching value exists, of the feature images of the current scale, in the two feature images of the current scale. The matching result of the previous scale is absorbed to be mapped to a smaller area in the clearer resolution image step by step to carry out feature matching again, so that the image matching efficiency is improved, and meanwhile, the positions of matched pixel points are positioned step by step more accurately, and the image matching accuracy is improved.

And when the pixel point matching confidence coefficient is obtained when the feature images corresponding to each scale are matched, in order to improve the matching efficiency, feature descriptors corresponding to the feature images are compared, and in order to fully consider the correlation between two feature images while combining the features of the images when the images are matched, on the basis of a self-attention mechanism, a cross-attention mechanism can be introduced to fuse the feature descriptors of the feature images of the current scale in a target area into another image to be matched, so that fused feature descriptors respectively corresponding to the two feature images of the current scale are obtained. When feature fusion of a cross attention mechanism is performed, the feature matching of the previous scale is actually completed, so that the matching confidence between the pixel points is obtained, and therefore the matching confidence can be used as background knowledge that the pixel points under the current scale are likely to be matched more, attention needs to be applied, so that in order to improve the efficiency of image matching, the feature descriptors of the feature images of the current scale in a target area can be fused into another image to be matched according to the cross attention weight directly according to the matching confidence of the previous scale.

Specifically, the accuracy of determining the matched pixel points according to the similarity between feature descriptors corresponding to the feature images is further improved, the correlation between two feature images under the specific resolution of the previous scale is contained in the matching confidence coefficient matrix obtained by feature matching of the feature images of the previous scale, so that the attention weight of each pixel point in the feature images of the current scale in image matching can be determined according to the matching confidence coefficient matrix corresponding to the previous scale, wherein the attention weight of the pixel point with high matching confidence coefficient of the previous scale in the current scale is higher, the corresponding feature of the pixel point is required to be more attention, correspondingly, the pixel point with lower matching confidence coefficient of the previous scale can be directly regarded as an outlier or a shielding point, and the pixel point is ignored or excluded in the matching process of the current scale, so that the efficiency and the accuracy of the image matching of the current scale can be improved.

Thus, in yet another embodiment of the present invention, the post-fusion feature descriptor determination process may include:

step 301: for a first characteristic image of a current scale, weighting a second characteristic image of the current scale according to the matching confidence coefficient matrix between two characteristic images of a previous scale to obtain a cross attention characteristic corresponding to the first characteristic image; the first characteristic image is any one of the two characteristic images, and the second characteristic image is one of the two characteristic images except the first characteristic image.

The product of the matching confidence matrix and the second feature image of the current scale can be determined as the cross attention feature to be fused to the first feature image. Correspondingly, the product of the matching confidence matrix and the first feature image of the current scale is determined as the cross-attention feature to be fused onto the second feature image.

Step 302: and carrying out feature fusion on the cross attention feature and the first feature image to obtain the fused feature descriptor corresponding to the first feature image.

Wherein the cross-attention feature is feature fused with the first feature image based on a cross-attention mechanism, specifically, the sum of the cross-attention feature and the first feature image may be used as the fused feature descriptor corresponding to the first feature image. Correspondingly, the cross-attention feature is feature fused with the second feature image based on a cross-attention mechanism, in particular, the sum of the cross-attention feature and the second feature image can be taken as the fused feature descriptor corresponding to the first feature image.

It should be noted that, in an alternative embodiment, the feature fusion model based on the cross attention mechanism may also perform cross feature fusion on the first feature image and the second feature image of the current scale, so as to obtain the fused feature descriptors corresponding to the first feature image and the second feature image of the current scale respectively. However, the feature fusion model based on the cross attention mechanism needs to be trained in advance, the cost is high, and the association relation between the image features is already contained in the matching confidence coefficient matrix between two feature images of the previous scale, so that the matching confidence coefficient matrix is directly used as the cross attention weight, the feature fusion efficiency can be improved, and the image matching efficiency is further improved.

Wherein, in one embodiment of the present invention, the determining process of the matching confidence matrix between the two feature images of the previous scale may include:

step 310: and determining the pixel points, of which the matching confidence coefficient is greater than a preset confidence coefficient threshold value, in the first characteristic image and the second characteristic image of the previous scale as initial matching pixel points in the first characteristic image of the previous scale aiming at the first characteristic image of the previous scale.

The preset confidence threshold is used for representing the similarity level of the matched pixel points between the two images.

For each pixel point in the first characteristic image of the previous scale, comparing all pixel points in the second characteristic image of the previous scale with the matching confidence coefficient of the pixel point and a preset confidence coefficient threshold value, and determining the pixel point, corresponding to the second characteristic image of the previous scale, with the matching confidence coefficient larger than the preset confidence coefficient threshold value as an initial matching pixel point corresponding to the pixel point. Correspondingly, for each pixel point in the second feature image of the previous scale, comparing all pixel points in the first feature image of the previous scale with the matching confidence coefficient of the pixel point and a preset confidence coefficient threshold value, and determining the pixel point, corresponding to the first feature image of the previous scale, with the matching confidence coefficient larger than the preset confidence coefficient threshold value as an initial matching pixel point corresponding to the pixel point.

Step 311: and carrying out collaborative filtering on the initial matched pixel points in the first characteristic image and the second characteristic image of the previous scale according to a mutual neighbor algorithm to obtain the target matched pixel points between the two characteristic images of the previous scale.

Based on the idea of a mutual nearest neighbor (Mutual Neareast Neighbor, MNN) algorithm, the embodiment of the invention determines two initial pixels in the first feature image and the second feature image as target matching pixels only when the matching confidence between the two initial pixels is the highest. For example, there is an initial pixel point P in the first feature image _a1 、P _a2 P _a3 The second characteristic image has an initial pixel point P _b1 、P _b2 P _b3 Wherein P is _a1 And P _b1 、P _b2 P _b3 Confidence of matching between 0.8, 0.2 and 0.4, P respectively _b1 And P _a1 、P _a2 P _a3 Confidence of matching between the two is 0.7, 0.5 and 0.3 respectively, and if the preset confidence threshold is 0.6, P is _a1 And P _b1 And matching pixel points for a pair of targets corresponding to the two characteristic images in the previous scale. By adopting the mutual neighbor strategy, the accuracy of image matching is further improved on the basis of matching pixel point screening according to a preset confidence threshold valueAnd (5) determining the rate.

Step 312: and mapping the target matching pixel points into the two characteristic images under the current scale to obtain the target region.

According to the position of the target matching pixel point in the characteristic image of the previous scale and the scale scaling relation between the characteristic image of the previous scale and the characteristic image of the current scale, the target matching pixel point is mapped into the two characteristic images of the current scale, the mapping area in the characteristic image of the current scale is reserved as a target area, and the areas except the target area in the characteristic image of the current scale are cut off.

Step 40: and screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched.

And determining the matched pixel points between the two images to be matched by using the pixel points with the matching confidence degrees between every two pixel points in the two feature images with the maximum scale larger than the preset confidence degree threshold value. The information of the matched pixels may include the location of the matched pixels in the image to be matched, the confidence value of the match between the matched pixels, and the like.

The flow of the image matching method in still another embodiment of the present invention when the number of dimensions is three is described below with reference to fig. 4:

1. global feature extraction

Two images to be matched are input. And extracting multi-level features, namely P4, P3 and P2 feature graphs of 1/8,1/4 and 1/2 of the original image dimension by using a feature extraction module. The feature extraction module may be a standard convolution structure with a feature map pyramid network.

Transformer coding Module

First, P4 is connected to a transform coding module, and a feature descriptor is obtained by using a self-attention mechanism in the transform, so that rough local features are converted into feature representations which are easy to match.

3. Feature matching module

Characteristic F obtained by passing P4 through a transducer coding module _A1 And F _B1 And sending the result to a feature matching module to obtain a matching score matrix M1 of all positions, and then obtaining matching prediction through threshold screening and a mutual neighbor algorithm in sequence.

4. Mapping the matching points to the next-stage feature map P3, cutting out the mapping region, inputting the mapping region into a transducer coding module, and obtaining a feature F _A2 And F _B2 。

5. Feature fusion module

Will feature F _A2 、F _B2 And the relation matrix M1 is sent to a feature fusion module to obtain the polymerized feature F' _A2 And F' _B2 And the communication and integration of the characteristic information between the two images are enhanced.

6. Correspondingly, mapping the matching result to the P2 characteristic diagram, and repeating the steps 2-5 to obtain a final result.

Wherein, when the feature fusion is performed, the following formula can be adopted:

F' _A2 ＝F _A2 +M1*F _B2 、F' _B2 ＝F _B2 +M1*F _A2 ；

where M1 represents the matching confidence matrix at a scale of 1/8. M1 x F _B2 Represents F _B2 Which points are sum F _A2 Matching, thereby characterizing which points are to be emphasized, thereby excluding outliers or occlusion points. Thus, a novel feature F 'is obtained after fusion' _A2 Contains the original characteristics and also contains F _B2 The relation between the two can ensure that accurate and robust matching results can be obtained on different scales through the progressive refinement strategy, and the characteristics are that the matching results are rich in detail for the areas. The embodiment of the invention provides a solution with gradual refinement, multiscale and high accuracy for image matching, ensures that excellent matching effect can be obtained under various scenes and conditions, and particularly adopts enhanced feature reconstruction: by using a transducer model, the method can deeply reconstruct the feature vector, thereby capturing moreRich and complex image information. The embodiment of the invention also adopts a multi-scale refinement strategy: the progressive refinement strategy in the method ensures that accurate matching can be obtained on feature maps with different resolutions, particularly in areas with rich details, which further enhances the matching accuracy.

The embodiment of the invention also adopts the self-adaptive matching deepening: by repeatedly performing feature reconstruction and matching at different resolution levels, the method can continuously deepen and refine the matching result, so that the matching is more adaptive and deep.

The image matching method provided by the embodiment of the invention comprises the steps of obtaining two images to be matched; extracting characteristic images of each image to be matched under at least three preset scales respectively to obtain characteristic images of each image to be matched under each scale; performing image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale; and screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched. Therefore, the method is different from the problem that in the prior art, image matching is only carried out on a single scale, excessive image details are lost, and the accuracy of image matching is low.

Fig. 5 shows a schematic structural diagram of an image matching apparatus according to an embodiment of the present invention. As shown in fig. 5, the apparatus 50 includes: an acquisition module 501, an extraction module 502, a matching module 503 and a screening module 504.

An obtaining module 501, configured to obtain two images to be matched;

the extracting module 502 is configured to extract feature images of each image to be matched under at least three preset scales, so as to obtain feature images of each image to be matched under each scale;

a matching module 503, configured to perform image matching on the feature images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two feature images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale;

And the screening module 504 is configured to screen the pixels in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale, so as to obtain information of the matched pixels between the two images to be matched.

The image matching device provided by the embodiment of the invention obtains two images to be matched; extracting characteristic images of each image to be matched under at least three preset scales respectively to obtain characteristic images of each image to be matched under each scale; performing image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale; and screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched. Therefore, the method is different from the problem that in the prior art, image matching is only carried out on a single scale, excessive image details are lost, and the accuracy of image matching is low.

Fig. 6 shows a schematic structural diagram of an image matching device according to an embodiment of the present invention, and the specific embodiment of the present invention is not limited to the specific implementation of the image matching device.

As shown in fig. 6, the image matching apparatus may include: a processor 602, a communication interface (Communications Interface), a memory 606, and a communication bus 608.

Wherein: processor 602, communication interface 604, and memory 606 perform communication with each other via communication bus 608. Communication interface 604 is used to communicate with network elements of other devices, such as clients or other servers. The processor 602 is configured to execute the program 610, and may specifically perform the relevant steps in the embodiment of the image matching method described above.

In particular, program 610 may include program code comprising computer-executable instructions.

The processor 602 may be a central processing unit CPU or a specific integrated circuit ASIC (Application Specific Integrated Circuit) or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors comprised by the image matching device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

A memory 606 for storing a program 610. The memory 606 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 610 may be specifically invoked by the processor 602 to cause the image matching device to:

acquiring two images to be matched;

An embodiment of the present invention provides a computer readable storage medium storing at least one executable instruction that, when executed on an image matching apparatus, causes the image matching apparatus to perform the image matching method in any of the above-described method embodiments.

The executable instructions may be specifically configured to cause the image matching device to:

acquiring two images to be matched;

The executable instructions stored in the computer storage medium provided by the embodiment of the invention are obtained by acquiring two images to be matched; extracting characteristic images of each image to be matched under at least three preset scales respectively to obtain characteristic images of each image to be matched under each scale; performing image matching on the characteristic images of the two images to be matched under each scale according to the ascending order of the scales until a matching confidence coefficient matrix between the two characteristic images with the maximum scale is obtained; the matching confidence coefficient matrix is used for representing the confidence coefficient between each corresponding pixel point in the two characteristic images; the image matching includes: fusing the feature descriptor of the feature image of the current scale in the target area into another image to be matched based on an attention mechanism to obtain fused feature descriptors respectively corresponding to the two feature images of the current scale; the target area is obtained by mapping pixel points, of which the matching confidence coefficient is larger than a preset confidence coefficient threshold value, in the matching confidence coefficient matrix corresponding to the two feature images of the previous scale to the two feature images of the current scale; determining the matching confidence coefficient matrix between the two characteristic images of the current scale according to the similarity between the fused characteristic descriptors of the two characteristic images of the current scale; and screening the pixel points in the two images to be matched according to the matching confidence coefficient matrix between the two feature images with the maximum scale to obtain the information of the matched pixel points between the two images to be matched. Therefore, the method is different from the problem that in the prior art, image matching is only carried out on a single scale, excessive image details are lost, and the accuracy of image matching is low.

The embodiment of the invention provides an image matching device for executing the image matching method.

An embodiment of the present invention provides a computer program that can be invoked by a processor to cause an image matching device to perform the image matching method of any of the method embodiments described above.

An embodiment of the present invention provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when run on a computer, cause the computer to perform the image matching method of any of the method embodiments described above.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component, and they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A method of image matching, the method comprising:

acquiring two images to be matched;

2. The method according to claim 1, wherein the performing image matching on the feature images of the two images to be matched under each scale in the ascending order of the scale until obtaining a matching confidence matrix between the two feature images with the largest scale comprises:

3. The method according to claim 1, wherein the performing image matching on the feature images of the two images to be matched under each scale in the ascending order of the scale until obtaining a matching confidence matrix between the two feature images with the largest scale comprises:

4. The method according to claim 1, wherein the extracting feature images of each image to be matched under at least three preset scales to obtain feature images of each image to be matched under each scale includes:

5. The method of claim 4, wherein the output of the last second convolution layer corresponding to the current scale is fused with the output of the last second convolution layer corresponding to the previous scale in ascending order of the scale, and then the fused output is output to the last second convolution layer corresponding to the current scale, wherein the last second convolution layer is a 1 x 1 convolution layer.

6. The method of claim 5, wherein the output of the last second convolution layer of the previous scale is upsampled and then fused with the second last convolution layer corresponding to the current scale to obtain the input of the last second convolution layer corresponding to the current scale, wherein the last second convolution layer is a 3 x 3 convolution layer.

7. The method according to claim 1, wherein the performing image matching on the feature images of the two images to be matched under each scale in the ascending order of the scale until obtaining a matching confidence matrix between the two feature images with the largest scale comprises:

8. An image matching apparatus, the apparatus comprising:

the acquisition module is used for acquiring two images to be matched;

9. An image matching apparatus, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the image matching method of any one of claims 1-7.

10. A computer readable storage medium, wherein at least one executable instruction is stored in the storage medium, which when run on an image matching device causes the image matching device to perform the operations of the image matching method according to any one of claims 1-7.