Background
Through years of development, binocular stereo vision has played an important role in the fields of three-dimensional reconstruction, industrial measurement, unmanned driving and the like. Stereo matching is the research core content of binocular vision and is also a research difficulty of binocular vision. Up to now, the conventional binocular vision stereo matching method is mainly divided into the following three types: global matching, local matching, and semi-global matching. The global matching generally comprises matching cost calculation, parallax calculation and parallax optimization, and the core of the global matching method is that a global energy function is constructed and minimized, so that an optimal parallax image is obtained; the global matching method has good results, but the general running time is long, and the method is not suitable for real-time running.
In recent years, more and more methods for stereo matching are performed by using convolutional neural networks. CNNs were used earlier to solve the problem of matching consistency by similarity calculation, which calculates the similarity of a pair of tiles to determine if they match. Although the CNN-based stereo matching method has some improvement in speed and accuracy compared to the conventional binocular vision stereo matching method, the performance in an uncomfortable region (e.g., a blocking region, a parallax discontinuous region, a weak texture region, a reflective surface, etc.) is still not ideal.
Disclosure of Invention
The invention provides a binocular vision stereo matching method based on an improved PSMAT (power grid network) to solve the problems in the background technology.
The invention provides a binocular vision stereo matching method based on an improved PSMAT, which comprises the following steps:
s1: extracting the characteristics of the left image and the right image by adopting a dimension reduction starting module to respectively obtain characteristic diagrams;
s2: inputting the obtained feature images into an SPP module, carrying out up-sampling after compressing the feature images by the SPP module, and synthesizing the feature images of different levels into a final SPP feature image, wherein the SPP feature image is generated by the following steps:
A. selecting one piece of characteristic information as a basis, and extracting a characteristic connection value on the characteristic information;
B. searching for feature information, searching for feature information which can be matched with the feature connection value from the acquired feature information according to the feature connection value on the basic feature information, and connecting to generate larger basic feature information;
C. extracting new feature connection values again on newly generated basic feature information, searching feature information which can be matched with the new feature connection values in the acquired feature information for connection, and sequentially searching for matching;
D. finally, forming a final SPP characteristic diagram;
s3: combining each parallax value in the left and right images, and forming a four-dimensional matching cost volume by the feature image corresponding to each parallax value and the SPP feature image;
s4: the three-dimensional convolution module aggregates the environmental information, obtains a final parallax image through up-sampling and parallax regression, calculates the possibility of each parallax according to the prediction cost Cd obtained through normalized exponential function operation, optimizes by using an identity mapping, and obtains a predicted parallax value through summation of each parallax value and the corresponding possibility.
Preferably, in the step S1, the start module performs image acquisition by scanning the image to obtain a data image, and extracts the feature data according to the feature of the feature data on the image data.
Preferably, in the step S2, in the up-sampling process, a method based on the edge of the original low-resolution image is adopted, the edge of the low-resolution image is detected first, then the pixels are classified according to the detected edge, the low-resolution image is interpolated by adopting a traditional method, then the edge of the high-resolution image is detected, and finally the edge and the pixels nearby are subjected to special processing to remove the blur and enhance the edge of the image.
Preferably, in the step S2, the SPP module samples the feature map, the SPP module is connected with each computing module, and after the SPP module collects the feature map, each computing module can extract feature information of the feature map from the SPP module.
Preferably, in the step S3, the feature map of the disparity value is matched with the SPP feature map through feature data on the feature value, direction feature data is set on the SPP feature map, and the direction features are matched with each other to form a multi-dimensional cost volume.
Preferably, in the step S4, the convolution module adopts a 1×1 convolution module, and the convolution module can effectively reduce the dimension of the thickness of the feature map, so that the width of the network can be increased and the adaptability of the network to multiple scales can be increased without increasing network parameters, and the matching precision is improved.
Preferably, in the step S4, the normalization exponential function accelerates the convergence speed of the network training, and simultaneously, the normalization enables the training to use a higher learning rate without too many initialization operations, and in combination with other network optimization operations, the test time is reduced when the image is tested.
Preferably, in S4, the summation formula is:
training with the Focal loss function, the loss function is defined as follows:
FL(x)=-αx
γ log (1-x), where d is the group-trunk disparity value,
is the predicted disparity value.
The binocular vision stereo matching method based on the improved PSMAT has the beneficial effects that:
1. through the revealing initiation module of dimension reduction, feature extraction can be better carried out.
2. By adding a corresponding normalization layer to each layer, the training can use a larger learning rate, and the convergence speed of the network training is accelerated.
3. Improving loss function, ensuring matching accuracy and improving matching speed
Detailed Description
The invention will be further illustrated with reference to specific examples.
The invention provides a binocular vision stereo matching method based on an improved PSMAT, which comprises the following steps:
s1: the method comprises the steps that a dimension reduction starting module is adopted to conduct feature extraction on left and right images to obtain feature images respectively, the starting module is used for collecting the images to scan the images to obtain data images, and feature data are extracted according to the characteristics of feature data on the image data;
s2: inputting the obtained feature images into an SPP module, compressing the feature images by the SPP module, then up-sampling, in the up-sampling process, adopting a method based on the edges of original low-resolution images, firstly detecting the edges of the low-resolution images, then classifying and processing pixels according to the detected edges, interpolating the low-resolution images by adopting a traditional method, then detecting the edges of high-resolution images, finally performing special processing on the edges and nearby pixels to remove blurring and strengthen the edges of the images, synthesizing the feature images of different levels into a final SPP feature image, sampling the feature images by the SPP module, connecting the SPP module with each calculation module, and after the SPP module collects the feature images, extracting the feature information of the feature images by each calculation module, wherein the SPP feature image is generated by the SPP module:
A. selecting one piece of characteristic information as a basis, and extracting a characteristic connection value on the characteristic information;
B. searching for feature information, searching for feature information which can be matched with the feature connection value from the acquired feature information according to the feature connection value on the basic feature information, and connecting to generate larger basic feature information;
C. extracting new feature connection values again on newly generated basic feature information, searching feature information which can be matched with the new feature connection values in the acquired feature information for connection, and sequentially searching for matching;
D. finally, forming a final SPP characteristic diagram;
s3: combining each parallax value in the left and right images, forming a four-dimensional matching cost volume by the feature image corresponding to each parallax value and the SPP feature image, matching the feature image of the parallax value with the SPP feature image through feature data on the feature values, setting direction feature data on the SPP feature image, and mutually matching the direction features to form a multi-dimensional cost volume;
s4: the three-dimensional convolution module aggregates environment information, the convolution module adopts a 1 multiplied by 1 convolution module, and the convolution module can effectively reduce the dimension of the thickness of the feature map, so that the width of a network can be increased, the adaptability of the network to multiple scales can be increased, and the matching precision can be improved under the condition that the network parameters are not increased , And by upsampling and parallax backObtaining a final parallax image, calculating the possibility of each parallax according to the prediction cost Cd obtained by the operation of a normalized exponential function, wherein the normalized exponential function accelerates the convergence speed of network training, and simultaneously, the normalization enables the training to use a higher learning rate without too many initialization operations, and combines other network optimization operations, so that the test time is reduced when the image is tested , The identity mapping is used for optimization, the identity mapping can have good effect of optimization, the calculation speed can be greatly increased, and the predicted parallax value is obtained by summing each parallax value and the corresponding possibility , The summation formula is:
training with the Focal loss function, the loss function is defined as follows:
FL(x)=-αx
γ log (1-x), where d is the group-trunk disparity value,
is the predicted disparity value.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.