CN112712123B

CN112712123B - Matching screening method and device, electronic equipment and computer-readable storage medium

Info

Publication number: CN112712123B
Application number: CN202011641201.1A
Authority: CN
Inventors: 赵晨; 葛艺潇; 杨佳琪; 朱烽; 赵瑞; 李鸿升
Original assignee: Shanghai Sensetime Technology Development Co Ltd
Current assignee: Shanghai Sensetime Technology Development Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-02-22
Anticipated expiration: 2040-12-31
Also published as: WO2022142084A1; TW202228018A; TWI776718B; CN112712123A

Abstract

The embodiment of the application provides a matching screening method, a device, electronic equipment and a computer readable storage medium, wherein the matching screening method comprises the following steps: the electronic equipment acquires an initial matching set, wherein the initial matching set is derived from initial matching results between image pairs; screening a matching subset from the initial matching set through at least one cutting module, wherein the correct matching proportion in the matching subset is higher than that in the initial matching set, and the at least one cutting module is used for acquiring the consistency information of each initial matching in the initial matching set; the matching subset is used to process an image task associated with the image pair. The method and the device for processing the image task can improve the processing effect of the parameterized transformation model on the image task.

Description

Matching screening method and device, electronic equipment and computer-readable storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to a matching screening method and apparatus, an electronic device, and a computer-readable storage medium.

Background

In the fields of computer vision and image processing, feature matching is one of basic research problems, matching in an initial matching set generally selects points with consistent matching from two groups of feature points of an image pair based on Euclidean distance similarity between descriptors corresponding to the matching points between the image pair, and the matching method often has a large amount of mismatching.

Currently, a deep-learning neural network model is generally subjected to learning training based on an initial matching set and a corresponding image task is executed. Because the sample distribution in the initial matching set is often unbalanced, if the number of the mismatches in the initial matching set is much greater than the number of the correct matches, the learning process of the neural network model is susceptible to the interference of the mismatches, resulting in a poor image task execution effect of the neural network model.

Disclosure of Invention

The embodiment of the application provides a matching screening method, a matching screening device, electronic equipment and a computer readable storage medium, which can improve the processing effect of a parameterized transformation model on image processing tasks.

A first aspect of an embodiment of the present application provides a matching screening method, including:

obtaining an initial matching set, wherein the initial matching set is derived from initial matching results between image pairs;

screening a matching subset from the initial matching set through at least one cutting module, wherein the correct matching proportion in the matching subset is higher than that in the initial matching set, and the at least one cutting module is used for acquiring consistency information of each initial match in the initial matching set;

wherein the matching subset is used to process an image task associated with the image pair.

In this embodiment, the initial matching result may be a point whose matching is consistent from two sets of feature points based on a matching algorithm of a nearest neighbor euclidean distance ratio, and each initial match in the initial matching set may include feature information of a corresponding point in the image pair (for example, the feature information of the corresponding point in the image pair may include a combination of at least one of coordinates of the corresponding point, a pixel value of the corresponding point, a grayscale value of the corresponding point, and an RGB value of the corresponding point). The matches in the initial matching set are not necessarily all correct, there are correct matches, and there are also false matches, where the correct matching ratio refers to the ratio of the number of all correct matches in the initial matching set to the total number of the initial matching set. The method and the device can screen the initial matching set, so that the correct matching proportion in the screened matching subset is higher than that in the initial matching set, and the reliability of the calculated model parameters is higher due to the fact that the matching subset is screened from the initial matching set and the correct matching proportion in the matching subset is higher, so that the calculation precision of the model parameters of the parameterized transformation model is improved, and the processing effect of the parameterized transformation model on image processing tasks is improved.

Optionally, after the at least one cropping module screens out the matching subset from the initial matching set, the method further includes:

and predicting the initial matching set by using the parameterized transformation model to obtain a prediction result of each initial matching in the initial matching set, wherein the prediction result comprises correct matching or wrong matching.

The model parameters of the parameterized transformation model in the embodiment of the application are calculated by adopting the matching subsets, so that the reliability of the calculated model parameters is higher, the parameterized transformation model can better predict each initial matching in the initial matching set, and compared with a neural network model which directly predicts the initial matching set, the accuracy of the prediction result of the parameterized transformation model can be improved.

Optionally, the screening, by the at least one clipping module, a matching subset from the initial matching set includes:

screening the first matching set through a first cutting module to obtain a matching subset;

in the case that the at least one clipping module comprises one clipping module, the first matching set is the initial matching set;

and in the case that the at least one cutting module comprises at least two cutting modules, the first matching set is obtained by filtering through a last cutting module of the first cutting module.

The method and the device for matching the initial matching set can be suitable for the condition that the number of error matching in the initial matching set is small when one cutting module is adopted.

The at least two cutting modules are neural network learning modules, and can screen the initial matching set at least twice, so that the correct matching proportion in the screened matching set is higher, the calculation accuracy of the model parameters of the parametric transformation model is further improved, and the reliability of the calculated model parameters is higher when the image task is processed. The method and the device for matching the initial matching set can be suitable for the condition that the number of error matches in the initial matching set is large. Because the characteristics learned by each cutting module in the training process are different, at least two cutting modules are adopted, dynamic characteristic learning can be realized through feature learning at least twice, and compared with the training with fixed characteristics, the correct matching proportion in the screened matching subset can be improved.

Alternatively to this, the first and second parts may,

the screening the first matching set by the first clipping module to obtain a matching subset includes:

determining, by the first clipping module, local consistency information or global consistency information of a first initial match, and determining whether the first initial match is included in the matching subset according to the local consistency information or global consistency information of the first initial match; the first initial match is any one of the first set of matches.

Alternatively to this, the first and second parts may,

determining, by the first clipping module, local consistency information and global consistency information of a first initial match, and determining whether the first initial match is included in the matching subset according to the local consistency information and the global consistency information of the first initial match; the first initial match is any one of the first set of matches.

Optionally, the first clipping module includes a first local consistency learning module, a first global consistency learning module, and a first clipping sub-module;

the determining, by the first clipping module, local consistency information and global consistency information of a first initial match, and determining, according to the local consistency information and the global consistency information of the first initial match, whether the first initial match is included in the match subset, includes:

constructing, by the first local consistency learning module, a first local dynamic graph for a first initial match, calculating a local consistency score for the first initial match at the first local dynamic graph; the first local dynamic graph comprises a node where the first initial matching is located and K related nodes related to the node where the first initial matching is located; the K relative nodes are obtained by utilizing a K neighbor algorithm based on the node where the first initial matching is located;

constructing a first global dynamic graph through the first global consistency learning module, and determining a comprehensive consistency score of the first initial matching according to the local consistency score of the first initial matching on the first local dynamic graph and the first global dynamic graph;

and determining whether the first initial match is classified into the matching subset according to the comprehensive consistency score of the first initial match by using the first clipping submodule.

Optionally, the first local consistency learning module includes a first feature dimension-increasing module, a first dynamic graph building module, a first feature dimension-reducing module, and a first local consistency score calculating module;

the building, by the first local consistency learning module, a first local dynamic graph for a first initial match, calculating a local consistency score of the first initial match at the first local dynamic graph, including:

performing dimension-increasing processing on the initial feature vector of the first initial matching through the first feature dimension-increasing module to obtain a high-dimensional feature vector of the first initial matching;

determining K related matches with the highest degree of correlation (Euclidean distance) ranking in the first matching set and the high-dimensional feature vector of the first initial match through a K nearest neighbor algorithm by using the first local dynamic graph building module, and building a first local dynamic graph aiming at the first initial match based on the first initial match and the K related matches to obtain the ultrahigh-dimensional feature vector of the first initial match; the super-high-dimensional feature vector of the first initial match comprises a combination of the high-dimensional feature vector of the first initial match and a correlation vector between the first initial match and the K correlation matches;

performing dimensionality reduction processing on the first initially matched ultrahigh-dimensional feature vector by using the first feature dimensionality reduction module to obtain a first initially matched low-dimensional feature vector;

calculating, by the first local consistency score calculation module, a local consistency score of the first initial match at the first local dynamic graph based on the low-dimensional feature vectors of the first initial match.

Optionally, the first feature dimension reduction module includes a first circular convolution module and a second circular convolution module; the performing dimensionality reduction processing on the first initially matched ultrahigh-dimensional feature vector by using the first feature dimensionality reduction module to obtain a first initially matched low-dimensional feature vector includes:

grouping the first initially matched ultrahigh-dimensional feature vectors according to the correlation degree through the first cyclic convolution module, and performing first feature aggregation processing on each group of feature vectors to obtain initially aggregated feature vectors;

and carrying out secondary feature aggregation processing on the preliminarily aggregated feature vectors through the second annular convolution module to obtain the first initially matched low-dimensional feature vector.

Optionally, the determining a composite consistency score of the first initial match according to the local consistency score of the first initial match in the first local dynamic graph and the first global dynamic graph includes:

calculating a global consistency score of the first initial match at the first global dynamic graph;

determining a composite consistency score for the first initial match based on the local consistency score and the global consistency score.

Optionally, the constructing a first global dynamic graph by the first global consistency learning module includes:

constructing a first global dynamic graph according to the local consistency score of each initial match in the first matching set in the corresponding local dynamic graph through the first global consistency learning module;

the determining a composite consistency score of the first initial match according to the local consistency score of the first initial match at the first local dynamic graph and the first global dynamic graph comprises:

and calculating the comprehensive consistency score of the first initial matching according to the first global dynamic graph and the low-dimensional feature vector of the first initial matching.

Optionally, the first global dynamic graph is represented by a adjacency matrix, and the calculating a comprehensive consistency score of the first initial match according to the first global dynamic graph and the low-dimensional feature vector of the first initial match includes:

calculating a synthetic low-dimensional feature vector of the first initial match using a graph convolution network based on the low-dimensional feature vector of the first initial match and the adjacency matrix;

a composite consistency score for the first initial match is calculated based on the composite low-dimensional feature vector for the first initial match.

Optionally, the determining, by the first clipping sub-module, whether the first initial match is included in the match subset according to the comprehensive consistency score of the first initial match includes:

determining, by the first clipping sub-module, whether a composite consistency score of the first initial match is greater than a first threshold, and if so, determining that the first initial match falls within the subset of matches;

or, determining, by the first clipping sub-module, that the composite consistency score of the first initial match is ranked from large to small in the first match set, and if the ranking of the first initial match is greater than a second threshold, determining that the first initial match falls under the match subset.

Optionally, before the step of screening out the matching subset from the initial matching set by the at least one clipping module, the method further includes:

training the cutting module by using a supervised data set to obtain a training result;

and evaluating the training result through a binary classification loss function of the self-adaptive temperature, and updating the parameters of the cutting module according to a method for minimizing the binary classification loss function.

determining a constraint relation used by the parameterized transformation model according to the image tasks related to the image pair, wherein the constraint relation comprises epipolar geometric constraint or reprojection error;

after the screening, by the at least one cropping module, a subset of matches from the initial set of matches, the method further comprises:

calculating model parameters of the parameterized transformation model using the matching subsets in the case where the parameterized transformation model uses the constraint relationships.

Optionally, the image task includes any one of a straight line fitting task, a wide baseline image matching task, an image positioning task, an image stitching task, a three-dimensional reconstruction task, and a camera pose estimation task.

A second aspect of the embodiments of the present application provides a matching screening apparatus, including:

an obtaining unit, configured to obtain an initial matching set, where the initial matching set is derived from an initial matching result between image pairs;

the screening unit is used for screening a matching subset from the initial matching set through at least one cutting module, and the correct matching proportion in the matching subset is higher than that in the initial matching set;

wherein the matching subset is used to compute model parameters of a parametric transformation model used to process an image task related to the image pair.

A third aspect of embodiments of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions in the first aspect of embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform part or all of the steps as described in the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first or second aspect of embodiments of the present application. The computer program product may be a software installation package.

In the embodiment of the application, the electronic equipment acquires an initial matching set, wherein the initial matching set is derived from an initial matching result between image pairs; screening a matching subset from the initial matching set through at least one cutting module, wherein the correct matching proportion in the matching subset is higher than that in the initial matching set, and the at least one cutting module is used for acquiring consistency information of each initial match in the initial matching set; model parameters of a parametric transformation model for processing an image task related to the image pair are calculated using the matching subsets. According to the method and the device, the initial matching set can be screened, so that the correct matching proportion in the screened matching set is higher than that in the initial matching set, the calculation precision of the model parameters of the parametric transformation model can be improved, and the processing effect of the parametric transformation model on image processing tasks is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a schematic flow chart of a match screening method according to an embodiment of the present disclosure;

FIG. 1b is a schematic structural diagram of a consistency learning framework for match screening according to an embodiment of the present disclosure;

FIG. 2a is a schematic flow chart of another match screening method provided in the embodiments of the present application;

FIG. 2b is a schematic structural diagram of another consistency learning framework for match screening according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart illustrating a first clipping module screening an initial matching set according to an embodiment of the present application;

FIG. 4 is a block diagram of a first local consistency learning module according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a first feature dimension reduction module according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of feature clustering performed by a first circular convolution module and a second circular convolution module according to an embodiment of the present application;

FIG. 7a is a flowchart illustrating a process of calculating a composite consistency score for each initial match in the set of initial matches according to an embodiment of the present application;

FIG. 7b is a schematic flowchart of another method for calculating a composite consistency score for each initial match in the set of initial matches according to the present disclosure;

FIG. 8 is a schematic flow chart diagram of another match screening method provided in the embodiments of the present application;

FIG. 9 is a schematic diagram of the fitting effect of the method (CLNet) according to the embodiment of the present application and the PointCN method on the straight line fitting task;

FIG. 10 is a graph comparing the L2 distance on a straight line fitting task using the method of the present application (CLNet) with the PointCN, OANet, PointACN methods;

fig. 11 is a schematic structural diagram of a matching screening apparatus according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device according to the embodiment of the present application may include a device with computing capability, such as a personal computer, a mobile phone, a server, a face recognition device, a face passing device, an image processing device, a virtual reality device, and the like. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.

Referring to fig. 1a, fig. 1a is a schematic flow chart of a match screening method according to an embodiment of the present disclosure. As shown in fig. 1a, the matching screening method may include the following steps.

101, the electronic device obtains an initial matching set, the initial matching set being derived from initial matching results between the image pairs.

In this embodiment, the initial matching set may include a plurality of initial matches, and each initial match in the initial matching set may include feature information of a corresponding point in the image pair (for example, the feature information of the corresponding point in the image pair may include a combination of at least one of coordinates of the corresponding point, a pixel value of the corresponding point, a gray value of the corresponding point, and an RGB value of the corresponding point). An image pair is a pair of images used in an image task, typically comprising two images: a first image and a second image. For example, the initial matching result may be pixel points that are selected from the first image and the second image respectively and matched in accordance with a pixel-by-pixel matching algorithm. The matched pixels can be corresponding pixels in the first image and the second image. For example, the first image is a building photographed from one angle, the second image is the building photographed from another angle, and the matching pixels can be pixels in the first image and pixels in the second image at the same position of the building.

And 102, the electronic equipment screens out a matching subset from the initial matching set through at least one cutting module, wherein the correct matching proportion in the matching subset is higher than that in the initial matching set.

Wherein the at least one cropping module is configured to obtain consistency information for each initial match in the set of initial matches, and the subset of matches is configured to process image tasks associated with the image pair.

The consistency information of the initial match is used to measure the consistency of the initial match with other initial matches in the whole image, and specifically, the consistency may include consistency of the match in the orientation, rotation, translation and other dimensions.

In the embodiment of the present application, the matches in the initial matching set are not necessarily all correct, there are correct matches, and there are also false matches, where the correct matching ratio refers to a ratio of the number of all correct matches in the initial matching set to the total number of the initial matching set.

For example, step 102 may employ a trained neural network learning model (at least one clipping module) to filter the initial matching set, so that the correct matching proportion in the filtered matching set is higher than that in the initial matching set. At least one of the clipping modules in the embodiment of the application is a trained neural network learning model.

In one possible embodiment step 102 may comprise the steps of:

the electronic equipment screens the first matching set through the first cutting module to obtain a matching subset;

Optionally, the electronic device filters the first matching set through the first clipping module to obtain the matching subset, which may specifically include the following steps:

the electronic equipment determines local consistency information and global consistency information of a first initial match through the first clipping module, and determines whether the first initial match is classified into the matching subset according to the local consistency information and the global consistency information of the first initial match; the first initial match is any one of the first set of matches.

The local consistency information is the consistency of the first initial matching in the local area of the image, and the global consistency information is the consistency of the first initial matching in the whole image.

Specifically, when the at least one clipping module includes one clipping module, the electronic device determines, through the one clipping module, local consistency information and global consistency information of each initial match in the initial matching set, and screens out a matching subset from the initial matching set according to the local consistency information and the global consistency information of each initial match.

The method and the device for matching the initial matching set adopt a trained cutting module, and can be suitable for the condition that the number of error matching in the initial matching set is small. The cutting module is a neural network learning module, and because the cutting module can learn the characteristics in the training process, compared with the training by adopting fixed characteristics, the cutting module can improve the correct matching proportion in the screened matching subset.

In the embodiment of the application, the screening function of the first cropping module can be realized only under the condition of considering the local consistency information of the first initial matching, the global consistency information is not required to be considered, the calculation amount required by the matching screening can be saved for the image with small global consistency difference, and the matching screening can be realized quickly.

Under the condition of only considering the global consistency information of the first initial matching, the screening function of the first cropping module can be realized without considering the local consistency information, and for the image with small difference of local consistency, the calculation amount required by the matching screening can be saved, and the matching screening can be quickly realized.

And under the condition that the at least one cutting module comprises at least two cutting modules, the electronic equipment performs screening at least twice through the at least two cutting modules, and the latter cutting module is used for further screening the matching set screened by the former cutting module until the last cutting module screens the matching set.

In an embodiment of the application, the first cropping module is the last of the at least two cropping modules.

The at least two cutting modules are trained neural network learning modules and can screen the initial matching set at least twice, so that the correct matching proportion of the screened matching set is high, the calculation accuracy of the model parameters of the parametric transformation model is improved, and the reliability of the calculated model parameters is high when the image task is processed.

The cropping module can be trained by a large number of supervised samples (known in advance as correct matching samples), predict each match, calculate the training loss, and determine the cropping module as a trained cropping module when the training loss is less than a set value.

The trained at least two cutting modules do not screen the initial matching set at the same time, but screen one by one in sequence, that is, the output result screened by the last cutting module is used as the input of the next cutting module. For example, if the at least two cropping modules include 2 cropping modules: and the cutting module 1 and the cutting module 2, the cutting module 1 performs first screening on the initial matching set to obtain a matching set 1, and the cutting module 2 performs second screening on the matching set 1 to obtain a matching subset. For example, if the at least two cropping modules include 3 cropping modules: the cutting module 1, the cutting module 2 and the cutting module 3, the cutting module 1 performs first screening on the initial matching set to obtain a matching set 1, the cutting module 2 performs second screening on the matching set 1 to obtain a matching set 2, and the cutting module 3 performs third screening on the matching set 2 to obtain a matching subset.

In one possible embodiment, if the number of the cropping modules is 3, the number of the initial matching sets is 10000, and the cropping module filters 50% at a time, the number of the matches in the screened matching subsets is 1250, and since the cropping module fully considers the local consistency and the global consistency of each match, the proportion of the correct matches in the matching subsets is much higher than that in the initial matching sets. The matching subset has small error matching proportion, and when the matching subset is used for the straight line fitting task, the interference caused by error matching is small, so that the processing effect of the straight line fitting task is improved.

Referring to fig. 1b, fig. 1b is a schematic structural diagram of a consistency learning framework for match screening according to an embodiment of the present disclosure. As shown in fig. 1b, the consistency Learning framework Consensus Learning frame, CLNet, includes at least two clipping modules and a parameterized transformation model, where N represents the number of initial matches in the initial matching set, and 4 represents the 4-dimensional coordinates of the initial matches (e.g., 4-dimensional coordinates consisting of the coordinate position of a first pixel in the first image and the coordinate position of a second pixel in the second image that matches the pixel in the first image). And gradually screening the initial matching set through K (K is greater than or equal to 2) clipping modules (the clipping modules based on local-to-global consistency learning) to obtain a matching subset (the matching subset comprises N1 candidate matches), wherein the model parameters of the parameterized transformation model are calculated based on the N1 candidate matches. Each cutting module can comprise a local consistency learning module, a global consistency learning module and a cutting submodule.

The method and the device for matching the initial matching set can be suitable for the condition that the number of error matches in the initial matching set is large. Because the characteristics learned by each cutting module in the training process are different, at least two cutting modules are adopted, dynamic characteristic learning can be realized through feature learning at least twice, and compared with the training with fixed characteristics, the correct matching proportion in the screened matching subset can be improved.

Wherein the electronic device computes model parameters of a parameterized transformation model using the matching subsets, the parameterized transformation model being for processing image tasks associated with the image pairs.

In an embodiment of the present application, the parameterized transformation model may be used to predict each initial match in the initial matching set, where each initial match is predicted to be a correct match or an incorrect match. Since the model parameters of the parameterized transformation model are calculated based on the matching subsets, the model parameters may be, for example, an intrinsic matrix (intrinsic matrix). The matching subsets are screened from the initial matching set, and the correct matching proportion in the matching subsets is higher, so that the reliability of the calculated model parameters is higher, the calculation precision of the model parameters of the parameterized transformation model is improved, and the processing effect of the parameterized transformation model on image processing tasks is improved.

The image tasks related to the image pairs may include any one of a line fitting (line fitting) task, a wide-baseline image matching (wide-baseline image matching) task, an image localization (image localization) task, an image stitching task, and a three-dimensional reconstruction task.

In one possible embodiment, before performing step 102, the method of FIG. 1a may further perform the following steps:

(11) the electronic equipment trains the cutting module by using a supervised data set to obtain a training result;

(12) and the electronic equipment evaluates the training result through a temperature-adaptive binary-classification loss function, and updates the parameters of the cutting module according to a method for minimizing the binary-classification loss function to obtain the trained cutting module.

In the embodiment of the application, although the training using the conventional binary cross entropy loss has a satisfactory effect, the training mode has the epipolar distance d_thrThere is still inevitable tag ambiguity (i.e., at d) for nearby matches_thrA nearby match may be determined to be a correct match or a false match). Due to matching c_iShould be associated with the corresponding epipolar line distance d_iNegative correlation, i.e. d_iThe closer to 0, the more likely it is to be judged as a correct match, and therefore the embodiment of the present application is directed to the presumed correct match (d)_i<d_thr) An adaptive temperature is introduced, the calculation formula of which is expressed by a Gaussian kernel.

Gaussian nucleus tau_i＝exp(-||d_i-d_thr||/α·d_thr)；

Where α is the kernel width of the Gaussian kernel, for d_i＞＝d_thrIs the outlier ci, is the sum of τ_iIs set to 1. Because of the inherent ambiguity of the pole, the problem of label ambiguity cannot be solved, the embodiment of the present application describes the training target as:

wherein Lreg represents a parameterized transformation model

λ is a weighting factor. The two classification loss functions of the adaptive temperature provided by the embodiment of the application are as follows:

wherein,

is the output of the local consistency learning layer of the jth pruning module,

is the output of the global consistency learning layer of the jth pruning module,

is the output of the last MLP of the last clipping module (w ═ tanh (relu (o)); h (o) σ (τ · o) (σ is sigmoid activation function); y is_j，

A data set representing the binary label is correct; l is_bceRepresenting a binary cross entropy loss; k is the number of clipping modules, therefore, for a system with a smaller d_iWith the correct match ci, model optimization with smaller temperatures is more confident in performing larger regularizations.

According to the method and the device, the initial matching set can be screened, so that the correct matching proportion in the screened matching subset is higher than that in the initial matching set, the model parameters of the parametric transformation model are calculated by utilizing the matching subset, the calculation precision of the model parameters of the parametric transformation model can be improved, and the processing effect of the parametric transformation model on image processing tasks is further improved.

Referring to fig. 2a, fig. 2a is a schematic flow chart of another matching screening method according to an embodiment of the present disclosure. Fig. 2a is further optimized based on fig. 1a, and as shown in fig. 2a, the matching screening method may include the following steps.

The electronic device obtains an initial set of matches, the initial set of matches derived from initial matches between pairs of images 201.

The electronic device screens out a matching subset from the initial matching set through at least one cropping module 202, the at least one cropping module is used for obtaining consistency information of each initial match in the initial matching set, a correct matching proportion in the matching subset is higher than a correct matching proportion in the initial matching set, and the matching subset is used for processing an image task related to the image pair.

In particular, the matching subset is used to compute model parameters of a parametric transformation model used to process image tasks associated with the image pair.

The specific implementation of steps 201 to 202 can refer to steps 101 to 102 in fig. 1a, which is not described herein again.

And 203, the electronic equipment predicts the initial matching set by using the parameterized transformation model to obtain a prediction result of each initial matching in the initial matching set, wherein the prediction result comprises correct matching or wrong matching.

In the embodiment of the application, the model parameters of the parameterized transformation model are calculated by adopting the matching subsets, so that the reliability of the calculated model parameters is higher, the parameterized transformation model can better predict each initial matching in the initial matching set, and compared with a neural network model which directly predicts the initial matching set, the accuracy of the prediction result of the parameterized transformation model can be improved.

Referring to fig. 2b, fig. 2b is a schematic structural diagram of another consistency learning framework for match screening according to an embodiment of the present application. As shown in fig. 2b, the consistency learning framework includes at least two clipping modules, a parameterized transformation model and a full-scale prediction module, where N represents the number of initial matches in the initial matching set, and 4 represents the 4-dimensional coordinates of the initial matches (e.g., 4-dimensional coordinates consisting of the coordinate position of a first pixel in the first image and the coordinate position of a second pixel in the second image that matches the pixel in the first image). The initial matching set is gradually screened by K (K is greater than or equal to 2) clipping modules (the clipping modules based on local-to-global consistency learning) to obtain a matching subset (the matching subset comprises N1 candidate matches), the model parameters of the parameterized transformation model are calculated based on N1 candidate matches, and the full-size prediction module is used for predicting (i.e., full-size predicting) N initial matches in the initial matching set to obtain a prediction result (the prediction result comprises a correct match or an incorrect match) of each initial matching pair in the initial matching pair set. Each cutting module can comprise a local consistency learning module, a global consistency learning module and a cutting submodule.

At present, accurate pixel-by-pixel feature matching is a prerequisite to solve many important image tasks in computer vision, robotics, and the like. For example, structure from motion (SfM), simultaneous localization and mapping (SLAM), image stitching, visual localization, virtual reality, and the like. SfM in computer vision refers to the process of obtaining 3D structural information by analyzing 2D moving images of objects. However, pictures in the real world often include many factors such as rotation, translation, scale, view angle change and illumination change, making the matching screening method very challenging.

In current learning-based approaches, match screening is often used as a match classification task, where multi-layer perceptrons (MLPs) are used to classify matches (either correct or incorrect), however, optimization of such binary classification problems is not trivial and matches can be highly unbalanced, e.g., outliers (incorrect matches) can account for up to 90% or more. Therefore, the accuracy of directly predicting the correct matching result in the initial matching set through MLP is low.

By adopting the method shown in fig. 2a, the reliability of the calculated model parameters is higher, the parameterized transformation model can better predict each initial matching in the initial matching set, and compared with a neural network model which directly predicts the initial matching set, the accuracy of the prediction result of the parameterized transformation model can be improved.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a first clipping module screening an initial matching set according to an embodiment of the present disclosure, and as shown in fig. 3, the method may include the following steps.

301, the electronic device constructs a first local dynamic graph aiming at the first initial matching through a first local consistency learning module, and calculates a local consistency score of the first initial matching on the first local dynamic graph; the first local dynamic graph comprises a node where the first initial matching is located and K relative nodes related to the node where the first initial matching is located; the K relative nodes are obtained based on the node where the first initial matching is located by utilizing a K neighbor algorithm.

In the embodiment of the application, the first clipping module comprises a first local consistency learning module, a first global consistency learning module and a first clipping submodule. Wherein the first cropping module is a first of the trained at least two cropping modules.

The first local consistency learning module can construct a first local dynamic graph and calculate a local consistency score for the first initial match at the first local dynamic graph. The first global consistency learning module may construct a first global dynamic graph, calculate a global consistency score of the first initial match on the first global dynamic graph, and may also calculate a composite consistency score of the first initial match.

The first partial dynamic graph is constructed according to the correlation between the high-dimensional feature vector of the first initial matching and other initial matches after the initial feature vectors of the initial matching are mapped to the high-dimensional feature vectors. Each initial match maps to a node of the first local dynamic graph. The node at which the first initial match is located is the node to which the first initial match maps to the first local dynamic graph. For example, K relative nodes closest to the node where the first initial match is located may be found according to a K-Nearest Neighbor algorithm (KNN), and a graph formed by the node where the first initial match is located and the K relative nodes is taken as a first local dynamic graph. The dynamic graph is named mainly because after the initial matching is mapped to the high-dimensional feature vector from the initial feature vector, the nodes found by the K-nearest neighbor algorithm every time are not necessarily the same and are dynamically changed.

The local consistency score of the first initial matching on the first local dynamic graph is used for measuring the local consistency of the first initial matching, if the first initial matching is a correct matching, the local consistency of the first initial matching is better, and the local consistency score is higher; if the first initial match is a false match, it is locally less consistent with a lower local consistency score.

In one possible embodiment, please refer to fig. 4, where fig. 4 is a schematic structural diagram of a first local consistency learning module according to an embodiment of the present application. As shown in fig. 4, the first local consistency learning module includes a first feature dimension-increasing module, a first dynamic graph building module, a first feature dimension-reducing module, and a first local consistency score calculating module.

Step 301 may include the steps of:

(21) the electronic equipment performs dimension-increasing processing on the initial feature vector of the first initial matching through the first feature dimension-increasing module to obtain a high-dimensional feature vector of the first initial matching;

(22) the electronic device determines, by using the first local dynamic graph construction module, K related matches with a high-dimensional feature vector of the first initial match in the first matching set, which are ranked at a top in a degree of correlation (e.g., a degree of correlation determined according to euclidean distance) through a K-nearest neighbor algorithm, and constructs a first local dynamic graph for the first initial match based on the first initial match and the K related matches, so as to obtain an ultrahigh-dimensional feature vector of the first initial match; the super-high-dimensional feature vector of the first initial match comprises a combination of the high-dimensional feature vector of the first initial match and a correlation vector between the first initial match and the K correlation matches;

(23) the electronic equipment performs dimensionality reduction on the first initially matched ultrahigh-dimensional feature vector by using the first feature dimensionality reduction module to obtain a first initially matched low-dimensional feature vector;

(24) the electronic device calculates, by the first local consistency score calculation module, a local consistency score of the first initial match at the first local dynamic graph based on the low-dimensional feature vector of the first initial match.

In this embodiment, the first feature dimension-increasing module may be a trained deep neural network module, such as a trained residual network, and the residual network may include a plurality of residual modules (ResNet Blcok), such as 4 ResNet Blcok. The first feature dimension increasing module may perform dimension increasing processing on each initially matched initial feature vector in the first matching set to obtain each initially matched high-dimensional feature vector. The first dynamic graph building module, the first feature dimension reduction module and the first local consistency score calculating module can be trained deep neural network modules. For example, the first feature dimensionality reduction module may include a plurality of residual modules (ResNet Blcok), and the first local consistency score calculation module may include multi-layer perceptrons (MLPs).

The initial feature vector of the first initial matching may be a four-dimensional vector, and includes a combination of coordinates of a first pixel point of the first initial matching in the first image of the image pair and coordinates of a second pixel point of the first initial matching in the second image of the image pair. For example, if the coordinates of the first pixel point are (x1, y1) and the coordinates of the second pixel point are (x2, y2), the initial feature vector p1 of the first initial match is (x1, y1, x2, y 2). The high-dimensional feature vector of the first initial match may be a 128-dimensional vector.

In a possible embodiment, please refer to fig. 5, and fig. 5 is a schematic structural diagram of a first feature dimension reduction module according to an embodiment of the present application. The first feature dimensionality reduction module comprises a first circular convolution module and a second circular convolution module.

The step (23) may include the steps of:

(231) the electronic equipment groups the first initially matched ultrahigh-dimensional feature vectors according to the correlation degree through the first annular convolution module, and performs first feature aggregation processing on each group of feature vectors to obtain initially aggregated feature vectors;

(232) and the electronic equipment carries out secondary feature aggregation processing on the preliminarily aggregated feature vectors through the second annular convolution module to obtain the first initially matched low-dimensional feature vector.

In this embodiment of the present application, a first cyclic convolution (annular convolution) module groups the first initially matched ultrahigh-dimensional eigenvectors according to a correlation, and dimensions of each group of eigenvectors are the same. For example, 10% of the top relevancy ranks are divided into one group, 10% to 20% of the top relevancy ranks are divided into one group, 20% to 30% of the top relevancy ranks are divided into one group, 30% to 40% of the top relevancy ranks are divided into one group, 40% to 50% of the top relevancy ranks are divided into one group, 50% to 60% of the top relevancy ranks are divided into one group, 60% to 70% of the top relevancy ranks are divided into one group, 70% to 80% of the top relevancy ranks are divided into one group, 80% to 90% of the top relevancy ranks are divided into one group, 90% to 100% of the top relevancy ranks are divided into one group, and 10 groups are divided in total.

And after the first initially matched ultrahigh-dimensional feature vectors are grouped according to the correlation degree by the first cyclic convolution module, aggregating each group of feature vectors into one feature vector. For example, the ultrahigh-dimensional feature vector is k × 128-dimensional, and k × 128-dimensional can be divided into p groups: and (p multiplied by k/p) multiplied by 128, and the first cyclic convolution module carries out first feature aggregation processing on the (p multiplied by k/p) multiplied by 128 to obtain a feature vector with a primary aggregation dimension of k/p multiplied by 128. The second circular convolution module may aggregate k/p 128 into 1 x 128 dimensional low dimensional feature vectors. Wherein the parameters in the matrix learned in the first circular convolution module are not shared with the parameters in the matrix learned in the second circular convolution module. The parameters in the matrix refer to the values of the elements in the matrix.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating feature aggregation performed by a first circular convolution module and a second circular convolution module according to an embodiment of the present disclosure. As shown in fig. 6, c for the first initial match₁In other words, all the K correlation matches determined by the K-nearest neighbor algorithm are reflected in the first partial dynamic graph of fig. 6, where fig. 6 illustrates an example where K equals 12, and the first initial match c₁The graph composed of the located node and the K-12 joint points is used as a first local dynamic graph, and the 12 joint points are matched with the first initial matching c₁The correlation (for example, Euclidean distance) of the located nodes is divided into 3 groups, then the first cyclic convolution module is used for carrying out first feature aggregation processing, the first cyclic convolution module is used for carrying out first feature aggregation processing on (p multiplied by k/p) multiplied by 128, and the obtained feature vector of the initial aggregation is (k/p) multiplied by 128 dimensions. The second cyclic convolution module may cluster (k/p) x 128Into a low-dimensional feature vector of 1 x 128 dimensions.

According to the method and the device, the annular convolution module is adopted to perform dimensionality reduction after grouping according to the relevance according to the ultrahigh-dimensional feature vector of the first initial matching, the local consistency of the first initial matching is fully considered, the local consistency of the first initial matching is still kept by the low-dimensional feature vector of the first initial matching after dimensionality reduction, and therefore the accuracy of the calculation result of the local consistency fraction of the first initial matching in the first local dynamic graph is improved.

And 302, the electronic device constructs a first global dynamic graph through a first global consistency learning module, and determines a comprehensive consistency score of the first initial matching according to the local consistency score of the first initial matching in the first local dynamic graph and the first global dynamic graph.

In the embodiment of the application, the first global dynamic graph comprises nodes where all initial matches are located, the global consistency score of the first initial match in the first global dynamic graph can be determined through the first global dynamic graph, and the comprehensive consistency score of the first initial match is determined according to the local consistency score of the first initial match in the first local dynamic graph and the global consistency score of the first initial match in the first global dynamic graph. A composite consistency score for the first initial match may also be determined based on the local consistency score for the first initial match at the first local dynamic graph and the first global dynamic graph. The synthesized consistency score of the first initial match is obtained by synthesizing the local consistency score of the first initial match on the first local dynamic graph and the global consistency score of the first initial match on the first global dynamic graph.

In one possible embodiment, in step 302, the electronic device determining a composite consistency score of the first initial match according to the local consistency score of the first initial match in the first local dynamic graph and the first global dynamic graph may include the following steps:

(31) calculating, by a first global consistency learning module, a global consistency score for the first initial match at the first global dynamic graph;

(32) a first global consistency learning module determines a composite consistency score for the first initial match based on the local consistency score and the global consistency score.

In this embodiment of the present application, the first global consistency learning module may calculate a global consistency score of the first initial matching in the first global dynamic graph, and determine a comprehensive consistency score of the first initial matching according to the local consistency score and the global consistency score. Specifically, the first global consistency learning module may directly add the local consistency score and the global consistency score, and use the sum of the local consistency score and the global consistency score as the comprehensive consistency score. The first global consistency learning module may also calculate a composite consistency score according to a weighted algorithm.

In another possible embodiment, in step 302, the electronic device building a first global dynamic graph through the first global consistency learning module may include the following steps:

(41) the electronic equipment constructs a first global dynamic graph according to the local consistency score of each initial match in the first matching set in the corresponding local dynamic graph through the first global consistency learning module;

in step 302, the electronic device determines, according to the local consistency score of the first initial match in the first local dynamic graph and the first global dynamic graph, a comprehensive consistency score of the first initial match, including:

(42) the electronic equipment calculates the comprehensive consistency score of the first initial matching according to the first global dynamic graph and the low-dimensional feature vector of the first initial matching.

In the embodiment of the application, the global consistency score is not directly calculated, but the comprehensive consistency score is directly calculated on the basis of the local consistency score and the first global dynamic graph, so that the calculation process of the global consistency score is reduced, and the calculation efficiency of the comprehensive consistency score is improved.

Optionally, the first global dynamic graph is represented by an adjacency matrix, and step (32) may specifically include the following steps:

(421) the electronic equipment calculates a comprehensive low-dimensional feature vector of the first initial matching by utilizing a Graph Convolution Network (GCN) based on the low-dimensional feature vector of the first initial matching and the adjacency matrix;

(422) the electronic device calculates a composite consistency score for the first initial match based on the composite low-dimensional feature vector for the first initial match.

In the embodiment of the present application, the first global dynamic graph is constructed by the local consistency score of each initial match in the first matching set in the corresponding local dynamic graph. The adjacency matrix is obtained by multiplying a matrix formed by each initial matching in the corresponding local consistency fraction of the local dynamic graph in the first matching set and the corresponding transpose matrix. For example, a matrix formed by local consistency fractions of each initial match in the first matching set in the corresponding local dynamic graph is N × 1, and a corresponding transposed matrix is 1 × N, and the two matrices are multiplied to obtain an N × N matrix, which is an adjacent matrix.

If the matrix learned by the graph convolution network after training is W^gIf the adjacency matrix is a and the matrix Z is formed by each initially matched low-dimensional feature vector in the first matching set, the result output by the graph convolution network is as follows:

out＝L·Z·W^g；

wherein L ═ D^-1/2·A’·D^-1/2(ii) a D is a diagonal matrix (diagonal degree matrix) of A ═ A + I_N，I_NIs a matrix to ensure numerical stability. L is an NxN matrix, Z is an Nx128 matrix, W^gA 128 x 128 matrix.

And obtaining a result out output by the graph convolution network, adding the result out output by the graph convolution network to a matrix Z consisting of each initially matched low-dimensional feature vector in the first matching set to obtain each initially matched comprehensive low-dimensional feature vector in the first matching set, processing each initially matched comprehensive low-dimensional feature vector through a residual error module, inputting an MLP (multi level processing), reducing the dimension of each initially matched comprehensive low-dimensional feature vector in the first matching set by the MLP, and calculating the comprehensive consistency score of each initial match in the first matching set.

303, the electronic device determines, using the first clipping sub-module, whether the first initial match is included in the subset of matches based on the composite consistency score of the first initial match.

In the embodiment of the present application, the higher the integrated consistency score of the first initial match, the higher the probability that the first initial match is a correct match. The initial matches with higher comprehensive consistency scores in the first matching set can be classified into the matching subsets, or the initial matches with the top rank can be classified into the matching subsets according to the descending order of the comprehensive consistency scores in the first matching set. According to the embodiment of the application, the comprehensive consistency score of each initial matching is calculated, each initial matching in the first matching set can be classified through a simple index (comprehensive consistency score), the local consistency and the global consistency of each initial matching are comprehensively considered, more correct matches can be screened out from the first matching set through the first cutting module, and a better basis is laid for subsequently screening out the matching subset. Wherein the correct matching proportion in the first matching set is higher than the correct matching proportion in the first matching set.

In one possible embodiment, step 303 may include the steps of:

the electronic equipment determines whether the comprehensive consistency score of the first initial matching is larger than a first threshold value by using the first clipping submodule, and if so, determines that the first initial matching is classified into the matching subset;

or, the electronic device determines, by using the first clipping sub-module, that the comprehensive consistency score of the first initial match is ranked from large to small in the first matching set, and determines that the first initial match is included in the matching subset if the ranking of the first initial match is greater than a second threshold.

In the embodiment of the present application, the first matching set may be screened according to the comprehensive consistency score of each initial matching in the first matching set, and the initial matching in the first matching set whose comprehensive consistency score is greater than the first threshold is classified into the matching subset. Or screening according to the comprehensive consistency scores of each initial match in the first match set from big to small, and classifying the initial matches with the comprehensive consistency scores of the first match set larger than the first threshold into the match subset.

Referring to fig. 7a, fig. 7a is a flowchart illustrating a process of calculating a composite consistency score of each initial match in a first matching set (the first matching set is exemplified by an initial matching set). As shown in FIG. 7a, the initial matching set (c)₁，c₂，…c_N) Wherein c is₁An initial feature vector representing an initial match, c₁Which may be a 4-dimensional vector (a 4-dimensional vector consisting of the two-dimensional coordinates of the strip initially matching the first pixel point in the first image of the image pair and the two-dimensional coordinates of the strip initially matching the second pixel point in the second image of the image pair), the initial matching set includes N initial matches. The initial matching set becomes (z) after the characteristic dimension raising module raises dimension₁，z₂，…z_N)，z₁Representing an initially matched high-dimensional feature vector, z₁May be a 128-dimensional feature vector. The dynamic graph building module carries out composition on each initially matched high-dimensional feature vector and K initial matching with the highest correlation determined by the K neighbor algorithm, wherein each matching z is_iCan be prepared by [ z ]_i，Δz_i]Perform dimensionality increase, wherein Δ z_i＝(z_i-z_i ^j)，1≤j≤k。z_i ^jTo match z_iAny of the associated K matches. After dynamic patterning, each matched feature vector has dimensions k × 256. The feature dimensionality reduction module may reduce each match from k × 256 dimensions to 128 dimensions, and the local consistency score calculation module may calculate a local consistency score for each match. The global consistency module outputs a composite consistency score for each match. Wherein, the feature dimension increasing module may include 4 residual modules (for example, 1 residual module is used for increasing dimension, and the other 3 residual modules are used for solving the problem of deep neural network degradation), and the feature dimension reducing module may pass through 1 MLP (for reducing dimension, not shown in fig. 7a and 7b, which may reduce k × 256 dimensionsTo k × 128 dimensions), circular convolution (for dimensionality reduction, e.g., from k × 128 dimensions to 128 dimensions), and 4 residual blocks (4 residual blocks for solving the problem of deep neural network degradation), the local consistency score computation may be implemented by 1 MLP.

Referring to fig. 7b, fig. 7b is a schematic flowchart illustrating another process of calculating a composite consistency score of each initial match in the first matching set (the first matching set is exemplified by the initial matching set). Fig. 7b is further optimized based on fig. 7 a. The local consistency calculation process of fig. 7b is similar to that of fig. 7a, and fig. 7b describes the global consistency calculation process in detail. As shown in fig. 7b, after the local consistency score of each match is obtained, N × 1 is converted to 1 × N, the two are multiplied to obtain an N × N adjacency matrix, and a global dynamic composition is completed, where the adjacency matrix covers the consistency between each match and other matches in the initial matching set, that is, the adjacency matrix includes the global consistency information of each match. The pattern convolution network GCN essentially uses a shared parameter filter to calculate the weighted sum of the central pixel and the adjacent pixels to form a feature map, thereby realizing the extraction of a feature space. The image convolution network can modulate the information learned by the local consistency module into a spectrum, and a feature filter in the spectrum enables the propagated features to reflect the consistency in the laplacian of the global dynamics graph.

In the embodiment of the application, only the local consistency information of the first initial matching is considered, the screening function of the first cropping module can be realized, the global consistency information does not need to be considered, and for images with small global consistency difference, the calculation amount required by the matching screening can be saved, and the matching screening can be realized quickly.

Referring to fig. 8, fig. 8 is a schematic flowchart of another matching and screening method according to an embodiment of the present disclosure. Fig. 8 is further optimized based on fig. 2a, and as shown in fig. 8, the matching and screening method may include the following steps.

The electronic device obtains an initial set of matches, the initial set of matches derived from initial matches between pairs of images.

And 802, the electronic equipment screens out a matching subset from the initial matching set through at least one cutting module, wherein the at least one cutting module is used for acquiring consistency information of each initial matching in the initial matching set, and the correct matching proportion in the matching subset is higher than that in the initial matching set.

Wherein the matching subset is used to process image tasks associated with the image pair.

Steps 801 to 802 may refer to steps 201 to 202 shown in fig. 2a, and are not described herein again.

803, the electronic device determines a constraint relationship used by the parametric transformation model based on the image pair-dependent image tasks, the constraint relationship including epipolar geometric constraints or reprojection errors.

In the embodiment of the application, different image tasks may correspond to different constraint relationships. For example, if the image task is a three-dimensional reconstruction task, the constraint relationship used is epipolar geometry constraint (epipolar geometry constraint); if the image task is a straight line fitting task, the constraint relationship used is reprojection error (reprojection error).

Step 803 is executed before step 804, step 803 may be executed before step 801 or step 802, or after step 801 or step 802, or simultaneously with step 801 or step 802, and the embodiments of the present application are not limited.

Where the parameterized transformation model uses the constraint relationship, the electronic device computes model parameters of the parameterized transformation model using the matching subsets, the parameterized transformation model for processing image tasks associated with the image pairs.

And 805, the electronic equipment predicts the initial matching set by using a parameterized transformation model to obtain a prediction result of each initial matching in the initial matching set, wherein the prediction result comprises correct matching or wrong matching.

In the embodiment of the present application, the electronic device predicts the initial matching set by using the parameterized transformation model to obtain a prediction result of each initial matching in the initial matching set, including:

the electronic equipment calculates epipolar distance (epipolar distance) or reprojection error (reprojection error) of each match in the initial matching set by using a parameterized transformation model, and then determines a prediction result of each initial match according to the epipolar distance or the reprojection error of each match.

Wherein, if the model parameters of the parameterized transformation model are calculated by using the matching subsets under the epipolar geometric constraint, the model parameters may be an intrinsic matrix (intrinsic matrix). The electronic equipment calculates the epipolar distance of each match in the initial matching set by using a parameterized transformation model, and then determines the prediction result of each initial match according to the epipolar distance of each match. Specifically, according to the epipolar distance of each match, the match with the epipolar distance smaller than the third threshold is predicted as a correct match, and the match with the epipolar distance larger than the third threshold is predicted as an incorrect match.

If the model parameters of the parametric transformation model are obtained by calculation by using the matching subsets under the constraint of the reprojection errors, the electronic equipment calculates the reprojection error of each matching in the initial matching set by using the parametric transformation model, and then determines the prediction result of each initial matching according to the reprojection error of each matching. Specifically, according to the reprojection error of each match, a match with a reprojection error smaller than the fourth threshold may be predicted as a correct match, and a match with a reprojection error larger than the fourth threshold may be predicted as an incorrect match.

In the embodiment of the application, the constraint relation corresponding to the image task can be selected before the model parameters of the parameterized transformation model are calculated, so that the subsequent image task can be better completed through the calculated parameterized transformation model.

The effect of using the method of the present application (CLNet) and using the PointCN method on the straight line fitting task is presented below with reference to fig. 9. Adopting a PointCN method to directly perform straight line fitting on the initial matching set; according to the method, the matching subset is screened out from the initial matching set through the cutting module, then the straight line fitting is carried out according to the matching subset, most of error matching is screened out through the matching subset, the straight line fitting is slightly affected by the error matching, and therefore the reliability of the straight line fitting is improved. Two initial matching sets (an initial matching set in the first case and an initial matching set in the second case, the distribution of the initial matching sets in the two cases being different) are provided in fig. 9, both of the initial matching sets are from matches randomly distributed in a real scene, and for a given straight line fitting task, it is required that a model fits a given straight line, as can be seen from fig. 9, it is less reliable to use the PointCN method, and the fitting fails in the second case, whereas the fitting succeeds in both cases to use the method of the embodiment of the present application.

A comparison of the L2 distance on the straight line fitting task using the method of the present application (CLNet) and using the PointCN method, OANet method, PointACN method is presented below in conjunction with fig. 10. The ordinate of fig. 10 is the L2 distance error and the abscissa is the outlier (proportion of false matches) of the test data set. As can be seen from FIG. 10, the method of the present embodiment (CLNet) generalizes well over all five noise levels when the outliers of the test data set vary between 50% and 90%, and achieves a significant advantage in the most difficult case (i.e., 90% outliers). The evaluation index of fig. 10 is the L2 distance between the predicted straight line parameter and the true straight line, and the smaller the L2 distance, the higher the accuracy of prediction.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

In accordance with the above, please refer to fig. 11, fig. 11 is a schematic structural diagram of a match filter apparatus according to an embodiment of the present application, the match filter apparatus 1100 is applied to an electronic device, the match filter apparatus 1100 may include an obtaining unit 1101 and a filter unit 1102, where:

an obtaining unit 1101, configured to obtain an initial matching set, where the initial matching set is derived from initial matching results between image pairs;

a screening unit 1102, configured to screen a matching subset from the initial matching set through at least one clipping module, where a correct matching proportion in the matching subset is higher than a correct matching proportion in the initial matching set, and the at least one clipping module is configured to obtain consistency information of each initial match in the initial matching set;

Optionally, the matching screening apparatus 1100 may further include a prediction unit 1103;

a predicting unit 1103, configured to, after the screening unit 1102 screens the matching subset from the initial matching set through the at least one clipping module, predict the initial matching set by using the parameterized transformation model, to obtain a prediction result of each initial match in the initial matching set, where the prediction result includes a correct match or an incorrect match.

The optional screening unit 1102 screens a matching subset from the initial matching set through at least one clipping module, specifically:

Optionally, the screening unit 1102 screens the first matching set through the first clipping module to obtain a matching subset, which specifically includes: determining, by the first clipping module, local consistency information or global consistency information of a first initial match, and determining whether the first initial match is included in the matching subset according to the local consistency information or global consistency information of the first initial match; the first initial match is any one of the first set of matches.

Optionally, the screening unit 1102 screens the first matching set through the first clipping module to obtain a matching subset, which specifically includes: determining, by the first clipping module, local consistency information and global consistency information of a first initial match, and determining whether the first initial match is included in the matching subset according to the local consistency information and the global consistency information of the first initial match; the first initial match is any one of the first set of matches.

Optionally, the first clipping module includes a first local consistency learning module, a first global consistency learning module, and a first clipping sub-module; the screening unit 1102 determines, through the first clipping module, local consistency information and global consistency information of a first initial match, and determines, according to the local consistency information and the global consistency information of the first initial match, whether the first initial match is classified into the match subset, specifically:

the screening unit 1102 constructs a first local dynamic graph for a first initial matching through the first local consistency learning module, and calculates a local consistency score of the first initial matching in the first local dynamic graph, specifically:

Optionally, the first feature dimension reduction module includes a first circular convolution module and a second circular convolution module; the screening unit 1102 performs dimensionality reduction on the first initially matched ultrahigh-dimensional feature vector by using the first feature dimensionality reduction module to obtain a first initially matched low-dimensional feature vector, which specifically includes:

Optionally, the screening unit 1102 determines, according to the local consistency score of the first initial matching in the first local dynamic graph and the first global dynamic graph, a comprehensive consistency score of the first initial matching, specifically:

Optionally, the screening unit 1102 constructs a first global dynamic graph through the first global consistency learning module, specifically:

Optionally, the first global dynamic graph is represented by an adjacency matrix, and the screening unit 1102 calculates a comprehensive consistency score of the first initial matching according to the first global dynamic graph and the low-dimensional feature vector of the first initial matching, specifically:

Optionally, the screening unit 1102 determines, by using the first clipping sub-module, according to the comprehensive consistency score of the first initial match, whether the first initial match is classified into the matching subset, specifically:

Optionally, the matching and screening apparatus 1100 further includes a training unit 1104;

the training unit 1104 is configured to train the clipping module with a supervised data set before the screening unit 1102 screens the matching subset from the initial matching set through at least one clipping module, so as to obtain a training result; and evaluating the training result through a binary classification loss function of the self-adaptive temperature, and updating the parameters of the cutting module according to a method for minimizing the binary classification loss function.

Optionally, the matching and screening apparatus 1100 further includes a determining unit 1105 and a calculating unit 1106;

a determining unit 1105, configured to determine a constraint relationship used by the parameterized transformation model according to the image task related to the image pair before the screening unit 1102 screens out a matching subset from the initial matching set through at least one clipping module, where the constraint relationship includes epipolar geometric constraint or reprojection error;

the calculating unit 1106 is configured to calculate model parameters of the parameterized transformation model using the matching subsets if the parameterized transformation model uses the constraint relationship.

The obtaining unit 1101 in this embodiment of the application may be a communication module in an electronic device, and the screening unit 1102, the predicting unit 1103, the training unit 1104, the determining unit 1105 and the calculating unit 1106 may be processors or chips in the electronic device.

In the embodiment of the application, the initial matching set can be screened, so that the correct matching proportion in the screened matching set is higher than that in the initial matching set, the calculation precision of the model parameters of the parameterized transformation model can be improved, and the processing effect of the parameterized transformation model on the image processing task is further improved.

Referring to fig. 12, fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, as shown in fig. 12, the electronic device 1200 includes a processor 1201 and a memory 1202, and the processor 1201 and the memory 1202 may be connected to each other through a communication bus 1203. The communication bus 1203 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1203 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus. The memory 1202 is adapted to store a computer program comprising program instructions, and the processor 1201 is configured to invoke the program instructions, said program comprising instructions for performing the method shown in fig. 1a, 2a, 3.

The processor 1201 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control the execution of programs according to the above schemes.

The Memory 1202 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

In addition, the electronic device 1200 may further include general components such as a communication module, an antenna, and the like, which are not described in detail herein.

Embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the matching screening methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A match screening method, comprising:

screening a matching subset from the initial matching set through at least one cutting module;

the screening of the matching subset from the initial matching set by the at least one clipping module comprises: screening the first matching set through a first cutting module to obtain a matching subset;

under the condition that the at least one cutting module comprises at least two cutting modules, the first matching set is obtained by screening of the last cutting module of the first cutting module, the correct matching proportion in the matching subset is higher than that in the initial matching set, and the at least one cutting module is used for obtaining the consistency information of each initial matching in the initial matching set;

wherein the matching subset is used to process an image task related to the image pair;

2. The method of claim 1, wherein after the screening of the initial set of matches by the at least one cropping module for a subset of matches, the method further comprises:

and predicting the initial matching set by using a parametric transformation model to obtain a prediction result of each initial matching in the initial matching set, wherein the prediction result comprises correct matching or wrong matching.

3. The method of claim 1, wherein the filtering the first matching set by the first clipping module to obtain the matching subset comprises:

4. The method of claim 3, wherein the first cropping module comprises a first local consistency learning module, a first global consistency learning module, and a first cropping sub-module, and wherein the feature matching consistency information comprises a local consistency score and/or a global consistency score;

5. The method of claim 4, wherein the first local consistency learning module comprises a first feature dimension-raising module, a first dynamic graph building module, a first feature dimension-reducing module, and a first local consistency score calculation module;

determining K related matches with the top rank of the relevance of the high-dimensional feature vector of the first initial match in the first matching set through a K nearest neighbor algorithm by using the first dynamic graph construction module, constructing a first local dynamic graph aiming at the first initial match based on the first initial match and the K related matches, and obtaining the ultrahigh-dimensional feature vector of the first initial match; the super-high-dimensional feature vector of the first initial match comprises a combination of the high-dimensional feature vector of the first initial match and a correlation vector between the first initial match and the K correlation matches;

6. The method of claim 5, wherein the first feature dimensionality reduction module comprises a first circular convolution module and a second circular convolution module; the performing dimensionality reduction processing on the first initially matched ultrahigh-dimensional feature vector by using the first feature dimensionality reduction module to obtain a first initially matched low-dimensional feature vector includes:

7. The method according to any one of claims 4 to 6, wherein the determining a composite consistency score of the first initial match according to the local consistency score of the first initial match in the first local dynamic graph and the first global dynamic graph comprises:

8. The method according to claim 5 or 6, wherein the constructing of the first global dynamic graph by the first global consistency learning module comprises:

9. The method of claim 8, wherein the first global dynamic graph is represented by a adjacency matrix, and wherein computing the composite consistency score for the first initial match from the first global dynamic graph and the low-dimensional feature vector for the first initial match comprises:

10. The method according to any one of claims 4 to 6 and 9, wherein the determining, by the first clipping sub-module, whether the first initial match is included in the subset of matches according to the composite consistency score of the first initial match comprises:

11. The method of claim 7, wherein said determining, with the first cropping sub-module, whether the first initial match is included in the subset of matches based on the composite consistency score of the first initial match comprises:

12. The method of claim 8, wherein said determining, with the first cropping sub-module, whether the first initial match is included in the subset of matches based on the composite consistency score of the first initial match comprises:

13. The method of claim 1 or 2, wherein prior to the screening of the initial set of matches for a subset of matches by the at least one cropping module, the method further comprises:

14. The method of any of claims 1-6, 9, 11-12, wherein prior to the screening of the initial matching set for a subset of matches by at least one cropping module, the method further comprises:

determining a constraint relation used by a parameterized transformation model according to the image tasks related to the image pair, wherein the constraint relation comprises epipolar geometric constraint or reprojection error;

15. The method of claim 7, wherein prior to the screening of the initial matching set for a subset of matches by the at least one cropping module, the method further comprises:

16. The method of claim 8, wherein prior to the screening of the initial set of matches for a subset of matches by the at least one cropping module, the method further comprises:

17. The method of claim 10, wherein prior to the screening of the initial set of matches for a subset of matches by the at least one cropping module, the method further comprises:

18. The method of claim 13, wherein prior to the screening of the initial set of matches for a subset of matches by the at least one cropping module, the method further comprises:

19. The method according to any one of claims 1 to 6, 9 and 11 to 12, wherein the image task comprises any one of a straight line fitting task, a wide baseline image matching task, an image positioning task, an image stitching task, a three-dimensional reconstruction task and a camera attitude estimation task.

20. The method of claim 7, wherein the image task comprises any one of a straight line fitting task, a wide baseline image matching task, an image positioning task, an image stitching task, a three-dimensional reconstruction task, and a camera pose estimation task.

21. The method of claim 8, wherein the image task comprises any one of a straight line fitting task, a wide baseline image matching task, an image positioning task, an image stitching task, a three-dimensional reconstruction task, and a camera pose estimation task.

22. The method of claim 10, wherein the image task comprises any one of a straight line fitting task, a wide baseline image matching task, an image positioning task, an image stitching task, a three-dimensional reconstruction task, and a camera pose estimation task.

23. The method of claim 13, wherein the image task comprises any one of a straight line fitting task, a wide baseline image matching task, an image positioning task, an image stitching task, a three-dimensional reconstruction task, and a camera pose estimation task.

24. A match screen apparatus, comprising:

a screening unit, configured to screen a matching subset from the initial matching set through at least one clipping module, where a correct matching proportion in the matching subset is higher than a correct matching proportion in the initial matching set, and the at least one clipping module is configured to obtain consistency information of each initial match in the initial matching set;

the screening unit screens out a matching subset from the initial matching set through at least one clipping module, and comprises:

in the case that the at least one clipping module includes at least two clipping modules, the first matching set is obtained by filtering through a last clipping module of the first clipping module;

the screening unit screens the first matching set through the first clipping module to obtain a matching subset, and the screening unit comprises: determining, by the first clipping module, local consistency information or global consistency information of a first initial match, and determining whether the first initial match is included in the matching subset according to the local consistency information or global consistency information of the first initial match; the first initial match is any one of the first set of matches.

25. An electronic device comprising a processor and a memory, the memory for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 23.

26. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 23.