CN113421210A

CN113421210A - Surface point cloud reconstruction method based on binocular stereo vision

Info

Publication number: CN113421210A
Application number: CN202110821716.8A
Authority: CN
Inventors: 李岩; 李国文; 吴孟男
Original assignee: Dongguan Zhongke Sanwei Fish Intelligent Technology Co ltd; Changchun University of Technology
Current assignee: Changchun University of Technology
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2021-09-21
Anticipated expiration: 2041-07-21
Also published as: CN113421210B

Abstract

The invention belongs to the field of digital image processing, and particularly relates to a surface point cloud reconstruction method based on binocular stereo vision. The method comprises the following steps: the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name; secondly, preprocessing the corrected image; step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm; recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map; and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four. The method solves the problems of low reconstruction precision, low speed, poor migratability and the like through the processes of stereo correction, image preprocessing, background removal of the region of interest, stereo matching, point cloud reconstruction and the like.

Description

Surface point cloud reconstruction method based on binocular stereo vision

Technical Field

The invention belongs to the field of digital image processing, and particularly relates to a surface point cloud reconstruction method based on binocular stereo vision.

Background

In recent years, with the increasing level of automation in manufacturing industry and the increasing level of technology transformation in enterprises, machine vision technology is increasingly applied in industrial production, and binocular stereo vision technology is used as a passive, non-contact measuring means, and is favored by the market with a wide range of use conditions, a faster measuring speed and a reasonable price, and is not enjoyed.

The surface point cloud reconstruction technology based on binocular stereo vision can be applied to the fields of part identification and positioning, unmanned aerial vehicle autonomous navigation, satellite remote sensing surveying and mapping, 3D model reconstruction and the like, is a research hotspot and difficulty of artificial intelligence direction at the present stage, and has quite wide application prospect.

After the results of the existing research are summarized, although the existing surface point cloud reconstruction method based on binocular stereo vision is gradually improved, the method is not satisfactory when the following key problems are solved:

1) the existing preprocessing method can not give consideration to the denoising effect and the image characteristic detail retention during image filtering and enhancing, and is easy to cause image blurring and edge deletion, and point cloud defect;

2) the existing surface point cloud reconstruction method carries out point cloud recovery on a global image, lacks directivity, easily causes resource waste, reduces calculation efficiency and causes mismatching;

3) existing neural network-based stereo matching methods mostly calculate matching costs through a single scale without a disparity refinement step or use a traditional old disparity optimization method. The disparity map is likely to be discontinuous.

Disclosure of Invention

The invention provides a surface point cloud reconstruction method based on binocular stereo vision, which solves the problems of low reconstruction precision, low speed, poor migration and the like through the processes of stereo correction, image preprocessing, background removal of an interested region, stereo matching, point cloud reconstruction and the like.

The technical scheme of the invention is described as follows by combining the attached drawings:

a surface point cloud reconstruction method based on binocular stereo vision comprises the following steps:

the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;

secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening;

step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;

recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map;

and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.

The specific method of the second step is as follows:

21) weighted median filtering with bilateral filtering as weight;

performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:

wherein the content of the first and second substances,

to adjust the space;

is the color similarity; k is a radical of_iIs a regularization factor; i-j non-woven²And | i_i-j_j|²Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. i_iIs the abscissa of the adjacent pixel; j is a function of_jIs the ordinate of the adjacent pixel;

when selecting window R_iWhen the size is (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated_iOne pair of random sequences { I (i), w_i,jThe pixel values and weights of the z, then the weights are sorted in turn until the cumulative weight is greater than half the weight, at which point the corresponding i^*Is the new pixel value at the center point of the local window, as shown in the following equation:

wherein i^*Is a filtered disparity value; l is the pixel value of the center point of the window; w is a_ijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;

22) adaptive histogram equalization to define contrast;

carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and denoised image, namely M pixels multiplied by N pixels into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of the gray levels of the histograms which possibly appear as K, and the gray level of each subregion as r, wherein the histogram function corresponding to the region (M, N) is as follows:

H_m,n(r)，0≤r≤K-1；

wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;

confirmation of shear clipping magnitude β:

wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;

performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;

23) sharpening the Laplace image;

performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:

wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point.

The concrete method of the third step is as follows:

31) the interested area is selected by user interaction, and the pixel in the frame is defined as T_uThe other pixels are defined as background pixels T_B；

32) For T_BThe background pixel n in (1) is initialized, and the label of n is alpha_n0; for T_uInitializing a pixel n in the target pixel, wherein the label of the n is alpha_N＝1；

33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;

34) assigning a Gaussian in a Gaussian mixture model to each pixelSubstituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the highest probability as k_n：

Wherein D is_nEnergy data corresponding to the pixel n; alpha is alpha_nLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; z_nThe gray value corresponding to the pixel n;

35) and further performing learning optimization on the Gaussian mixture model according to given image data z:

wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;

36) gibbs energy term D analyzed by step 34)_nCalculating Gibbs energy weight 1/k_nThe segmentation is then estimated by the min-segmentation-max flow algorithm:

wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;

37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;

38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.

The concrete method of the fourth step is as follows:

41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; the features of the first two layers are up-sampled to the original resolution and fused by a 1 × 1 convolutional layer, with a step length of 1, for calculating the reconstruction error; compressing features of the first layer using a 1 × 1 convolutional layer with a step of 1, which will be used to compute dependencies in the disparity optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be simultaneously applied to a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);

42) the input of the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network DES-net is used for directly regressing the initial disparity;

43) the parallax optimization network, namely the DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which can reflect the correctness of the estimated parallax, and the reconstruction error is calculated as:

wherein, I_LIs a left image; i is_RIs a right image;

is the estimated disparity at location (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.

The invention has the beneficial effects that:

1) the method has strong robustness to illumination change, and the obtained point cloud model is complete: a three-step image preprocessing method is disclosed, which adopts weighted median filtering with bilateral filtering as weight, adaptive histogram equalization for limiting contrast, and Laplacian image sharpening, and retains edge and characteristic information while ensuring denoising effect

2) The method has high reconstruction speed and high precision, and only takes the reconstructed object as the region of interest to remove the complex background of the region of interest, thereby saving the calculation resources and reducing the mismatching probability caused by similar pixels in the background region;

3) the invention has accurate matching effect and smooth parallax image. The defect that only the matching cost under a single scale is calculated and a parallax optimization link is not provided in a conventional neural network method is overcome by the improved Convolutional Neural Networks (CNNs) which are composed of a shared feature extraction network, a parallax estimation network (DES-net) parallax optimization network (DRS-net) and a parallax optimization network (DRS-net).

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of step two of the present invention;

FIG. 3 is a block diagram of a convolutional neural network of the present invention;

fig. 4 is a flow chart of multi-scale feature extraction.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a surface point cloud reconstruction method based on binocular stereo vision includes the following steps:

secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening; the method comprises the following specific steps:

with reference to figure 2 of the drawings,

21) weighted median filtering with bilateral filtering as weight;

wherein the content of the first and second substances,

to adjust the space;

when selecting window R_iWhen the size is (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated_iOne pair of random sequences { I (i), w_i,jPixel value of } andweights are then sequentially ordered until the cumulative weight is greater than half the weight, at which point the corresponding i^*Is the new pixel value at the center point of the local window, as shown in the following equation:

22) self-adaptive histogram equalization;

carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and dehumidified image, namely M pixels multiplied by N pixels, into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of possible histogram gray levels as K, and the gray level of each subregion as r, and then the histogram function corresponding to the region (M, N) is as follows:

H_m,n(r)，0≤r≤K-1；

confirmation of shear clipping magnitude β:

setting the clipping limiting value beta can clip the pixels beyond the limited part, thereby achieving the purpose of limiting the contrast.

23) Sharpening the Laplace image;

wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point;

The region of interest is defined at the discretion of the user.

34) assigning a Gaussian component in the Gaussian mixture model to each pixel to obtain a targetSubstituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the maximum probability as k_n：

Fourthly, with reference to fig. 3, recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map; the system comprises a shared feature extraction module, a disparity estimation network (DES-net) and a disparity optimization network (DRS-net). The shared feature extraction network uses a connected network of shallow codec structures to extract common multi-scale features from left and right images. Some of these features are used to compute the matching cost values (i.e., correlations) of the disparity estimation network (DES-net) and the disparity optimization network (DRS-net). The features of the first layer are further compressed to produce c _ conv1a and c _ conv1b using a 1 × 1 convolution. These shared features are also used to calculate the reconstruction error of the disparity optimization network (DRS-net);

41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; referring to fig. 4, the features of the first two layers are upsampled to the original resolution and fused by 1 × 1 convolutional layer with step size of 1, and features with relatively large receptive field and different abstraction levels are obtained by the last deconvolution layer and the first convolutional layer for calculating the reconstruction error. "Conv 2 a" represents the second convolutional layer that shares the feature extraction module. The features of the first layer are compressed using a 1 x 1 convolutional layer with a step size of 1, which will be used to compute the correlation in the disparity optimized network (DRS-net). The features generated by the shared feature extraction module can be applied in a disparity estimation network (DES-net) and a disparity optimization network (DRS-net) at the same time;

42) the input of the disparity estimation network (DES-net) comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; a disparity estimation network (DES-net) for directly regressing the initial disparity;

43) the disparity optimization network (DRS-net) uses the shared features and the initial disparity to calculate a reconstruction error re, which may reflect the correctness of the estimated disparity, which is calculated as:

wherein, I_LIs a left image; i is_RIs a right image;

is the estimated disparity at location (i, j); i is the abscissa of the selected position pixel, and j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.

Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the scope of the present invention is not limited to the specific details of the above embodiments, and any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and these simple modifications belong to the scope of the present invention.

It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.

In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims

1. A surface point cloud reconstruction method based on binocular stereo vision is characterized by comprising the following steps:

2. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the second step is as follows:

21) weighted median filtering with bilateral filtering as weight;

wherein the content of the first and second substances,

to adjust the space;

when selecting window R_iSize and breadthIs (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated_iOne pair of random sequences { I (i), w_i,jThe pixel values and weights of the z, then the weights are sorted in turn until the cumulative weight is greater than half the weight, at which point the corresponding i^*Is the new pixel value of the local window center point; as shown in the following formula:

22) adaptive histogram equalization to define contrast;

H_m,n(r)，0≤r≤K-1；

confirmation of shear clipping magnitude β:

23) sharpening the Laplace image;

3. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the third step is as follows:

34) distributing Gaussian components in the Gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the maximum probability to be recorded as k_n：

4. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the fourth step is as follows:

41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; the features of the first two layers are up-sampled to the original resolution and fused by a 1 × 1 convolutional layer, with a step length of 1, for calculating the reconstruction error; compressing features of the first layer using a 1 × 1 convolutional layer with a step of 1, which will be used to compute the correlation in the disparity-optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be simultaneously applied to a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);

wherein, I_LIs a left image; i is_RIs a right image;

is the estimated disparity at location (i, j)(ii) a i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.