CN112465876A

CN112465876A - Stereo matching method and equipment

Info

Publication number: CN112465876A
Application number: CN202011458843.8A
Authority: CN
Inventors: 刘群坡; 席秀蕾; 刘尚争; 苏波; 盛月琴; 张建军
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-09

Abstract

The method comprises the steps of extracting feature points of a target image pair on the basis of an AKAZE algorithm, and performing feature description by using an M-LDB descriptor; then calculating a matching area by using an LK optical flow method to carry out conditional constraint; and finally, dividing the target image pair into a plurality of grids, matching the feature points through a FLANN algorithm and eliminating wrong matched pairs, so that the accuracy and the real-time performance of stereo matching are improved, and the improved AKAZE algorithm improves the accuracy of stereo matching and reduces the time cost, particularly when the target image pair with fuzzy, brightness and rotation-changing edges is processed.

Description

Stereo matching method and equipment

Technical Field

The present application relates to the field of computers, and in particular, to a stereo matching method and device.

Background

In the prior art, stereo matching is a key link in binocular vision positioning, the application is very wide in the field of part assembly, and how to quickly and accurately find a detected part and solve pose information of the detected part is a great problem to be solved. Many computer vision tasks, such as motion recognition, motion tracking, robot navigation, and visual localization, rely on extracting local features from different views of a target image to achieve matching.

At present, the research on matching algorithms mainly includes two types, one is to improve the performance of the feature descriptors, and the other is to add constraint conditions to eliminate mismatching points. In 2004, Lowe proposed an SIFT algorithm, which constructs a scale space by performing convolution operation on an original image and a gaussian kernel, and then extracts feature points with scale invariance on a gaussian difference pyramid, wherein the algorithm has affine, view angle, rotation and illumination invariance, and thus is widely applied to the aspect of feature matching. In 2006, Bay et al proposed a SURF algorithm for the disadvantages of slow speed and computational complexity in the SIFT algorithm computation process. The method has the advantages that the approximate Harr wavelet algorithm is used for extracting the feature points, the integral is used for obtaining the approximate Harr wavelet values in different scale spaces, the construction of a second-order differential template is reduced, and therefore the feature matching efficiency is improved. The Gaussian scale space constructed by the two methods is easy to ignore the edge detail information of the image. In 2013, Adrien Bartoli et al retained image edge information by describing features in a non-linear scale space using non-linear diffusion. A new 2D feature extraction method, AKAZE descriptor, is provided. In 2015, Zhenfeng Shao et al used a multiresolution region detector MSER and an illumination robust shape descriptor to extract local regions from an input image for matching, and proposed a feature matching algorithm suitable for illumination change images. In 2016, by chenchenchenchenchenchenchenchen kesheng et al, a mismatching point proposing algorithm for ORB feature matching is proposed by reducing the total number of sampling points. In the same year, Lin W Y et al propose a characteristic matcher Repmatch with extreme guidance for poor object reconstruction effect, which can adapt to wide baseline and repeated structure, however, the glass-encapsulated electrical connector image is not suitable for wide baseline situation. Tang P et al solve the logo recognition problem by enhancing the descriptive ability of local features with unique topological constraints, but the topological constraints have not yet been extended to general objects. Bian J et al, 2017, have transformed the motion smoothness constraint problem into statistical measures to eliminate the mismatch, and have proposed an effective grid-based score estimator GMS. But the GMS algorithm has a serious drawback of no rotation invariance. In 2018, Amin Sedaghat et al propose an algorithm based on unified capability to solve the problem of local feature extraction of remote sensing images. The method uses empirical parameters in the extraction process, and has certain limitation. Prakash C S et al propose a keypoint-based copy-motion forgery detection technique by using a combination of accelerated KAZE and scale invariant feature transformation features.

Because the iterative solution solved by the AKAZE algorithm by using the nonlinear equation has no uniqueness, the problem of mismatching of the feature points still exists in the conventional method, so that the image matching accuracy is low. In 2020, people of face-glowing and the like extract the features of the muscle bone image by using the FAST-SIFT algorithm, so that the problem of splicing the muscle bone image is solved, and the real-time performance and the practicability of the algorithm are to be improved. In the same year, xu, Hao et al combine the edge and color information of irregular paper scraps to realize the key point detection, but it is not suitable for images with more complex edges (such as glass-packaged electrical connectors).

Therefore, improving the accuracy of stereo matching, effectively eliminating the error matching, and especially when aiming at images with complex edges, reducing time consumption and simultaneously ensuring the accuracy of stereo matching is the direction that the technicians in the field need to continue research.

Disclosure of Invention

An object of the present application is to provide an image matching method and apparatus, so as to solve the problem of how to improve the accuracy of stereo matching and reduce time consumption in the prior art.

According to an aspect of the present application, there is provided a stereo matching method, including:

acquiring a target image pair, obtaining characteristic points of the target image pair based on an AKAZE algorithm, and describing all the characteristic points through an M-LDB descriptor;

tracking and detecting the feature points by an LK optical flow method and obtaining a matching area;

and performing feature point matching on the target image pair through a FLANN algorithm based on the matching area, and eliminating wrong matching pairs through grid motion statistics to obtain an optimal matching pair set.

Further, in the above stereo matching method, the obtaining of the target image pair, obtaining feature points of the target image pair based on an akage algorithm, and describing all the feature points by an M-LDB descriptor includes:

constructing a nonlinear scale space of the target image pair based on an AKAZE algorithm;

positioning all the feature points of the target image pair in the nonlinear scale space of the target image pair to obtain feature point position coordinates;

determining the direction of the characteristic point corresponding to each characteristic point based on the position coordinates of each characteristic point by taking the characteristic point as a center;

and describing all the feature points through an M-LDB descriptor based on the position coordinates of the feature points and the corresponding directions of the feature points.

Further, in the above stereo matching method, the detecting the feature points by the LK optical flow method to obtain the matching area includes:

tracking and detecting the characteristic points to obtain characteristic point pairs with the same characteristic point coordinates;

and obtaining optical flows of two pixel points corresponding to the characteristic point pair based on the gray value of the characteristic point pair and the pixel points corresponding to the characteristic point pair, and determining the matching areas of all the characteristic point pairs.

Further, in the stereo matching method, the performing feature point matching on the target image pair by a FLANN algorithm based on the matching area, and removing an incorrect matching pair by grid motion statistics to obtain an optimal matching pair set includes:

performing feature point matching on the target image pair through a FLANN algorithm based on the matching area to obtain a matching pair set;

and acquiring the number of characteristic points in the field of matching points in the matching pair set based on grid motion statistics, and rejecting the wrong matching pairs according to a preset probability evaluation standard function value to obtain an optimal matching pair set.

According to another aspect of the present application, there is also provided a computer readable medium having stored thereon computer readable instructions, which, when executed by a processor, cause the processor to implement the stereo matching method as described above.

According to another aspect of the present application, there is also provided an apparatus comprising:

one or more processors;

a computer-readable medium for storing one or more computer-readable instructions,

when executed by the one or more processors, cause the one or more processors to implement the stereo matching method described above.

Compared with the prior art, the method and the device have the advantages that the target image pair is obtained, the characteristic points of the target image pair are obtained based on the AKAZE algorithm, and all the characteristic points are described through the M-LDB descriptor; tracking and detecting the feature points by an LK optical flow method and obtaining a matching area; performing feature point matching on the target image pair through a FLANN algorithm based on the matching area, and eliminating wrong matching pairs through grid motion statistics to obtain an optimal matching pair set, namely extracting feature points of the target image pair on the basis of an AKAZE algorithm, and performing feature description by using an M-LDB descriptor; then calculating a matching area by using an LK optical flow method to carry out conditional constraint; and finally, dividing the target image pair into a plurality of grids, matching the feature points through a FLANN algorithm and eliminating wrong matched pairs, so that the accuracy and the real-time performance of stereo matching are improved, and the improved AKAZE algorithm improves the accuracy of stereo matching and reduces the time cost, particularly when the target image pair with fuzzy, brightness and rotation-changing edges is processed.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a stereo matching method in accordance with an aspect of the subject application;

FIG. 2 illustrates a glass package electrical connection image feature distribution diagram in an embodiment of a stereo matching method according to an aspect of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or flash Memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change RAM (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette tape, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

Fig. 1 is a schematic flow chart of a stereo matching method according to an aspect of the present application, the method is suitable for a binocular vision guided robot to implement a process of assembling parts (for example, image matching of a glass-enclosed electrical connector), and the method includes steps S11, S12, and S13, where the method specifically includes:

step S11, acquiring a target image pair, obtaining characteristic points of the target image pair based on an AKAZE algorithm, and describing all the characteristic points through an M-LDB descriptor; here, the AKAZE algorithm can detect detailed feature point information, and the problem of low image matching accuracy caused by neglecting the feature point information is avoided.

And step S12, tracking and detecting the feature points through an LK optical flow method and obtaining a matching area.

And step S13, performing feature point matching on the target image pair through a FLANN algorithm based on the matching area, and removing wrong matching pairs through grid motion statistics to obtain an optimal matching pair set.

In the steps S11 to S13, feature points of a target image pair are obtained based on an akage algorithm by obtaining the target image pair, and all the feature points are described by an M-LDB descriptor; tracking and detecting the feature points by an LK optical flow method and obtaining a matching area; performing feature point matching on the target image pair through a FLANN algorithm based on the matching area, and eliminating wrong matching pairs through grid motion statistics to obtain an optimal matching pair set, namely extracting feature points of the target image pair on the basis of an AKAZE algorithm, and performing feature description by using an M-LDB descriptor; then calculating a matching area by using an LK optical flow method to carry out conditional constraint; and finally, dividing the target image pair into a plurality of grids, matching the feature points through a FLANN algorithm and eliminating wrong matched pairs, so that the accuracy and the real-time performance of stereo matching are improved, and the improved AKAZE algorithm improves the accuracy of stereo matching and reduces the time cost, particularly when the target image pair with fuzzy, brightness and rotation-changing edges is processed.

For example, two glass-encapsulated electrical connector images are subjected to stereo matching, firstly, feature points of the target image pair, feature points a1, a2 and a3... ann. an of the first target image P1 and feature points b1, b2 and b3... ann. bn of the second target image P2 are obtained based on an AKAZE algorithm, and all the feature points a1, a2 and a3... ann. an and b1, b2 and b3... ann. bn are described through an M-LDB descriptor; then, tracking and detecting the characteristic points a1, a2 and a3. by an LK optical flow method, wherein the characteristic points a2 and b1, b2 and b3.. And finally, performing feature point matching on the target image pair through a FLANN algorithm based on the matching area, and eliminating wrong matching pairs through grid motion statistics to obtain an optimal matching pair set S, so that the accuracy of stereo matching is improved, and the time for completing matching is reduced.

Following the above embodiment of the present application, the step S11 obtains a target image pair, obtains feature points of the target image pair based on an akage algorithm, and describes all the feature points by using an M-LDB descriptor, including:

step S111, constructing a nonlinear scale space of the target image pair based on an AKAZE algorithm; in this case, constructing the non-linear scale space is advantageous to preserve edge information of the target image pair, so that image local operations are performed on the target image.

For example, the nonlinear filtering principle can be expressed by a nonlinear diffusion equation:

wherein L represents a luminance matrix of the glass package connector image, div and

and (3) expressing divergence and gradient solving, wherein t is a scale factor, the larger the value is, the simpler the image is represented, and c represents a transfer function, and diffusion is applied to the local structure of the image.

Calculation of AKAZEThe scale space of the method is pyramid, and has O groups and S layers, and the scale space is determined according to the time t through a diffusion function_iAnd realizing the construction of a scale space. The relationship between the scale parameter and the group number O and the layer number is as follows:

σ_i(O,S)＝σ₀2^(o+sS) (2)

the range of variables in formula (2) is: o is belonged to 0,1, …, O-1]，s∈[0,1,…,S-1]，i∈[0,1,…,M-1]。σ₀And M represents an initial scale parameter, wherein M is the total number of images in the scale space, and M is O multiplied by S.

Since the nonlinear diffusion filtering is related to time series, the scale parameter σ in pixel is required_iConversion to time t_i：

The solution of the nonlinear diffusion equation of the formula (1) is obtained by using the FED algorithm, so that the nonlinear scale space of the glass connection packaging device image is obtained as follows:

Lⁱ⁺¹＝(I+τA(Lⁱ))Lⁱ (4)

in the formula (4), i is ∈ [0, M-1 ]]I is the identity matrix, A (L)ⁱ) Representing the arrival matrix of the image of the glass connection encapsulator in the dimension i, tau being the step size and having the value t_i+1-t_i. The construction of the nonlinear scale space is beneficial to keeping the edge information of the target image pair, so that the image local operation is carried out on the target image.

Step S112, positioning all the feature points of the target image pair in the nonlinear scale space of the target image pair to obtain feature point position coordinates;

for example, comparing a certain point with other points in the neighborhood of the certain point under different scale spaces, and positioning the key point of the glass package electric connector image when the Hessian matrix takes the maximum value. The calculation formula is as follows:

sigma is a scale factor sigma_iThen, the accurate position of the sub-pixel point is solved according to the Taylor expansion:

x is the position coordinate of the characteristic point, and the sub-pixel coordinate of the characteristic point is calculated as follows:

step S113, determining the direction of the feature point corresponding to each feature point based on the position coordinates of each feature point by taking the feature point as the center;

for example, a first-order differential L of a circle which is centered at a feature point and has a radius of 6 sigma and which has all neighborhoods inside is searched for on a gradient image_xAnd L_yAnd performing Gaussian weighting operation. And then, rotating the 60-degree sector area around the origin to calculate the vector sum of the areas, wherein the main direction of the characteristic point is the longest direction of the vector sum.

And step S114, describing all the feature points through an M-LDB descriptor based on the feature point position coordinates and the corresponding feature point directions. The binary descriptor can be calculated in a parallelization mode, is high in efficiency, and is widely applied to the target identification and tracking process. And the M-LDB descriptor samples the pixels and averages the pixels to realize scale self-adaptation and ensure real-time performance. Therefore, the descriptor is suitable for image matching of glass-packaged electrical connectors with high matching requirements.

Next to the above-described embodiment of the present application, the step S12 of detecting the feature points and obtaining the matching area through LK optical flow tracking includes:

step S121, tracking and detecting the characteristic points, and acquiring characteristic point pairs with the same characteristic point coordinates;

step S122, obtaining optical flows of two pixel points corresponding to the characteristic point pair based on the gray value of the characteristic point pair and the pixel point corresponding to the characteristic point pair, and determining the matching areas of all the characteristic point pairs. Therefore, the accurate tracking of the feature points is realized, and the matching of the feature points is further favorably realized.

For example, assume that there are two glass-enclosed electrical connector grayscale images I and J, [ x, y ]]The gray values of the points are I (x, y) and J (x, y), respectively, and a pixel point u ═ u exists on the image I_x,u_y]Matching pixel point v ═ u + d ═ u on image J_x+d_x]^TSo that I (u)_x,u_y) And J (u)_x+d_x,u_y+d_y) With the smallest error between. Displacement d ═ d_x,d_y]^TRepresenting the optical flow of pixel points u and v. Image range [2 ω ] centered at point u_x+1,2ω_y+1]And solving the value of d by solving the square sum of the minimum value of the matching error. This loss function can be expressed as:

next to the above embodiments of the present application, the step S13 performs feature point matching on the target image pair through the FLANN algorithm based on the matching area, and eliminates an incorrect matching pair through grid motion statistics to obtain an optimal matching pair set, including:

step S131, performing feature point matching on the target image pair through a FLANN algorithm based on the matching area to obtain a matching pair set; here, the FLANN algorithm is an open source library for performing nearest neighbor search, optimizes fast nearest neighbor search and high-dimensional features of a large data set, effectively avoids a mismatching problem, and improves stereo matching accuracy.

Step S132, acquiring the number of characteristic points in the field of matching points in the matching pair set based on grid motion statistics, and eliminating the wrong matching pairs according to a preset probability evaluation standard function value to obtain an optimal matching pair set.

For example, after the glass-enclosed electrical connector images are initially matched, a part of corresponding matching pair set a is obtained, as shown in equation (9). The number of feature points in the neighborhood of the correct feature point is large, so that the mismatch can be eliminated according to the difference.

A_i,L-R＝{a₁,a₂,...,a_n} (9)

The distribution of matching point pairs can be expressed as:

in the formula (10), a_iA matching point pair representing a matching region and a region to be matched, K represents a number near the small neighborhood, n is the number of matching pairs, p₁And p₂Correct and false match rates, respectively. FIG. 2 is a graph of a glass package electrical connection image feature distribution.

The preset probability evaluation standard function value is p_aDetermining whether to eliminate the matching points according to the magnitude of the function value of the preset probability evaluation standard, wherein p_aThe expression of (a) is as follows:

the method realizes the elimination of the wrong matching pairs, improves the accuracy of stereo matching and reduces time consumption at the same time.

According to another aspect of the present application, there is also provided a computer readable medium having stored thereon computer readable instructions, which, when executed by a processor, cause the processor to implement the method of controlling user base alignment as described above.

According to another aspect of the present application, there is also provided a stereo matching apparatus, including:

one or more processors;

when executed by the one or more processors, cause the one or more processors to implement a method of controlling user base station on a device as described above.

Here, for details of each embodiment of the device, reference may be specifically made to corresponding parts of the embodiment of the method for controlling user base pairing at the device side, and details are not described here.

In summary, the method obtains the feature points of the target image pair based on the AKAZE algorithm by obtaining the target image pair, and describes all the feature points through the M-LDB descriptor; tracking and detecting the feature points by an LK optical flow method and obtaining a matching area; performing feature point matching on the target image pair through a FLANN algorithm based on the matching area, and eliminating wrong matching pairs through grid motion statistics to obtain an optimal matching pair set, namely extracting feature points of the target image pair on the basis of an AKAZE algorithm, and performing feature description by using an M-LDB descriptor; then calculating a matching area by using an LK optical flow method to carry out conditional constraint; and finally, dividing the target image pair into a plurality of grids, matching the feature points through a FLANN algorithm and eliminating wrong matched pairs, so that the accuracy and the real-time performance of stereo matching are improved, and the improved AKAZE algorithm improves the accuracy of stereo matching and reduces the time cost, particularly when the target image pair with fuzzy, brightness and rotation-changing edges is processed.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A stereo matching method, characterized in that the method comprises:

2. The method of claim 1, wherein the obtaining a target image pair, obtaining feature points of the target image pair based on an AKAZE algorithm, and describing all the feature points through an M-LDB descriptor comprises:

3. The method of claim 2, wherein said detecting said feature points and obtaining matching regions by LK optical flow tracking comprises:

4. The method according to any one of claims 1 to 3, wherein the performing feature point matching on the target image pair by a FLANN algorithm based on the matching region and removing wrong matching pairs by grid motion statistics to obtain an optimal matching pair set comprises:

5. A computer readable medium having computer readable instructions stored thereon, which, when executed by a processor, cause the processor to implement the method of any one of claims 1 to 4.

6. A stereo matching apparatus, characterized in that the apparatus comprises:

one or more processors;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.