CN113421210B

CN113421210B - Surface point Yun Chong construction method based on binocular stereoscopic vision

Info

Publication number: CN113421210B
Application number: CN202110821716.8A
Authority: CN
Inventors: 李岩; 李国文; 吴孟男
Original assignee: Changchun University of Technology
Current assignee: Changchun University of Technology
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2024-04-12
Anticipated expiration: 2041-07-21
Also published as: CN113421210A

Abstract

The invention belongs to the field of digital image processing, and particularly relates to a surface point Yun Chong construction method based on binocular stereoscopic vision. Comprising the following steps: step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling; step two, preprocessing the corrected image; step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm; recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image; and fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four. The method solves the problems of low reconstruction precision, low speed, poor mobility and the like through the procedures of three-dimensional correction, image preprocessing, interested region background removal, three-dimensional matching, point cloud reconstruction and the like.

Description

Surface point Yun Chong construction method based on binocular stereoscopic vision

Technical Field

The invention belongs to the field of digital image processing, and particularly relates to a surface point Yun Chong construction method based on binocular stereoscopic vision.

Background

In recent years, with the gradual improvement of the automation level of the manufacturing industry and the continuous upgrade of the scientific transformation of enterprises, the machine vision technology is increasingly applied to the industrial production, and the binocular stereo vision technology is taken as a passive and non-contact measurement means, so that the rapid measurement speed and reasonable price are favored by the market under the wide use condition.

The surface point cloud reconstruction technology based on binocular stereoscopic vision can be applied to the fields of part identification and positioning, unmanned aerial vehicle autonomous navigation, satellite remote sensing mapping, 3D model reconstruction and the like, is a research hotspot and difficulty in the artificial intelligence direction at the present stage, and has quite wide application prospects.

Through summarizing the prior research results, the existing surface point Yun Chong construction method based on binocular stereoscopic vision is gradually perfected, but the following key problems are solved:

1) The existing preprocessing method cannot give consideration to denoising effect and image characteristic detail retention during image filtering and enhancing, and is easy to cause image blurring and edge deletion, so that point cloud defects are caused;

2) The existing surface point Yun Chong building method aims at the global image to perform point cloud recovery, lacks directivity, easily causes resource waste, reduces calculation efficiency and causes mismatching;

3) Existing neural network-based stereo matching methods mostly calculate matching costs through a single scale without a disparity refinement step, or use a traditional old disparity optimization method. Which tends to cause disparity map discontinuities.

Disclosure of Invention

The invention provides a surface point Yun Chong construction method based on binocular stereo vision, which solves the problems of low reconstruction precision, low speed, poor mobility and the like through the processes of stereo correction, image preprocessing, interested area background removal, stereo matching, point cloud reconstruction and the like.

The technical scheme of the invention is as follows in combination with the accompanying drawings:

a surface point cloud reconstruction method based on binocular stereoscopic vision comprises the following steps:

step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;

preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening;

step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;

recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image;

and fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four.

The specific method of the second step is as follows:

21 Weighted median filtering with bilateral filtering as weight;

performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:

wherein,to adjust the space; />Is color similarity; k (k) _i Is a regularization factor; i-j ² And |i _i -j _j | ² Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e _i Is the abscissa of the adjacent pixel; j (j) _j Is the ordinate of the adjacent pixel;

selecting window R _i When the size is (2r+1) × (2r+1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated _i A pair of random sequences { I (I), w _i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i ^* Is a new pixel value for the center point of the local window, as shown in the following equation:

wherein i is ^* Is the filtered parallax value; l is the pixel value of the window center point; w (w) _ij Is a filtering weight; n is the total number of pixels in the windowThe method comprises the steps of carrying out a first treatment on the surface of the I is the number of pixels accumulated at present;

22 Adaptive histogram equalization defining contrast;

performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and denoised image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of the gray levels of the histograms which possibly occur as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:

H _m,n (r)，0≤r≤K-1；

where r is the gray level of each sub-region; k is the number of gray levels of the histogram;

confirming the shearing limit value beta:

wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;

performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;

23 Laplace image sharpening;

carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (i, j), the image subjected to Laplace operator treatment is:

wherein k (m, n) is a 3×3 laplacian mask; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplacian; m is the horizontal coordinate of the center pixel of the nine-grid; n is the ordinate of the center pixel of the nine-grid; i is the abscissa of the selected point; j is the ordinate of the selected point.

The specific method of the third step is as follows:

31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T _u The other pixels are defined as background pixels T _B ；

32 For T) _B Initializing background pixel n in the image, and marking n as alpha _n =0; for T _u Initializing a pixel n in the target pixel, and marking the label of n as alpha _N ＝1；

33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary way, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K classes through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of the Gaussian components to the total number of the pixels; the initialization process ends so far;

34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k _n ：

Wherein D is _n The energy data corresponding to the pixel n; alpha _n An opacity index value corresponding to pixel n; θ is a gray histogram of a target or background region of the image; z is Z _n A gray value corresponding to the pixel n;

35 Further learning optimization of the gaussian mixture model from the given image data z:

wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;

36 Gibbs energy term D) analyzed by step 34) _n Obtaining Gibbs energy weight 1/k _n The segmentation is then estimated by a min-max flow algorithm:

wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;

37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;

38 Smoothing the segmentation result by adopting a boundary extinction mechanism.

The specific method of the fourth step is as follows:

41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; the features of the first two layers are up-sampled to the original resolution and fused by a 1 x 1 convolution layer, the step length is 1, and the features are used for calculating reconstruction errors; compressing the features of the first layer using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate the correlation in the parallax optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be applied to both a disparity estimation network, namely DES-net, and a disparity optimization network, namely DRS-net;

42 The input to the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network, DES-net, is used to directly regress the initial disparity;

43 The parallax optimization network, i.e. DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:

wherein I is _L Is a left image; i _R Is the right image;is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.

The beneficial effects of the invention are as follows:

1) The method has strong robustness to illumination change, and the obtained point cloud model is complete: a three-step image preprocessing method is disclosed, which uses weighted median filtering with bilateral filtering as weight, adaptive histogram equalization with limited contrast, laplacian image sharpening, and maintains the edge and characteristic information while ensuring the denoising effect

2) The invention has high reconstruction speed and high precision, only considers the reconstruction object as the region of interest to remove the complex background, saves the calculation resources and reduces the mismatching probability caused by similar pixels of the background region;

3) The invention has accurate matching effect and smooth parallax map. The improved Convolutional Neural Network (CNNs) consists of a shared feature extraction network, a parallax estimation network (DES-net) parallax optimization network (DRS-net) and a parallax optimization network (DRS-net), and solves the defects that only matching cost under a single scale is calculated and a parallax optimization link is not available in a conventional neural network method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of a second step of the present invention;

FIG. 3 is a block diagram of a convolutional neural network of the present invention;

FIG. 4 is a flow chart of multi-scale feature extraction.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a surface point cloud reconstruction method based on binocular stereoscopic vision includes the following steps:

preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening; the method comprises the following steps:

with reference to figure 2 of the drawings,

21 Weighted median filtering with bilateral filtering as weight;

wherein i is ^* Is the filtered parallax value; l is the pixel value of the window center point; w (w) _ij Is a filtering weight; n is the total number of pixels within the window; i is the number of pixels accumulated at present;

22 Adaptive histogram equalization;

performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and de-dried image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of possible gray levels of the histograms as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:

H _m,n (r)，0≤r≤K-1；

confirming the shearing limit value beta:

the clipping limiting value beta is set to clip the pixels beyond the limiting part, so that the purpose of limiting the contrast is achieved.

23 Laplace image sharpening;

wherein k (m, n) is a 3×3 laplacian mask; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplacian; m is the horizontal coordinate of the center pixel of the nine-grid; n is the ordinate of the center pixel of the nine-grid; i is the abscissa of the selected point; j is the ordinate of the selected midpoint;

The region of interest is defined by the user himself.

Step four, referring to fig. 3, recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image; it includes a shared feature extraction module, a disparity estimation network (DES-net), a disparity optimization network (DRS-net). The shared feature extraction network uses a connected network of shallow codec structures to extract common multi-scale features from left and right images. Part of these features are used to calculate matching cost values (i.e., correlations) for the disparity estimation network (DES-net) and the disparity optimization network (DRS-net). The features of the first layer are further compressed to produce c_conv1a and c_conv1b using a 1 x 1 convolution. These shared features are also used to calculate the reconstruction errors of the parallax optimized network (DRS-net);

41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; referring to fig. 4, features of the first two layers are up-sampled to the original resolution and fused by 1 x 1 convolution layers, with a step size of 1, and features with a relatively large receptive field and different levels of abstraction are obtained by the last deconvolution layer and the first convolution layer for calculating the reconstruction error, wherein. "Conv2a" represents the second convolution layer sharing the feature extraction module. The features of the first layer are compressed using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate correlations in a parallax optimized network (DRS-net). The features generated by the shared feature extraction module may be applied in both a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);

42 The input of the disparity estimation network (DES-net) comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; a disparity estimation network (DES-net) is used to directly regress the initial disparity;

43 A parallax optimization network (DRS-net) uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:

wherein I is _L Is a left image; i _R Is the right image;is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel and j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.

The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the scope of the present invention is not limited to the specific details of the above embodiments, and within the scope of the technical concept of the present invention, any person skilled in the art may apply equivalent substitutions or alterations to the technical solution according to the present invention and the inventive concept thereof within the scope of the technical concept of the present invention, and these simple modifications are all within the scope of the present invention.

In addition, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further.

Moreover, any combination of the various embodiments of the invention can be made without departing from the spirit of the invention, which should also be considered as disclosed herein.

Claims

1. The surface point cloud reconstruction method based on binocular stereoscopic vision is characterized by comprising the following steps of:

step five, reconstructing the surface point cloud according to the parallax map obtained in the step four;

the specific method of the second step is as follows:

21 Weighted median filtering with bilateral filtering as weight;

selecting window R _i When the size is (2r+1) × (2r+1), wherein R is the radius of the window, the number of pixels contained in the window is s, and the window R is calculated _i A pair of random sequences { I (I), w _i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i ^* New pixel values that are the center points of the local windows; the following formula is shown:

wherein i is ^* Is the filtered parallax value; l is the pixel value of the window center point; w (w) _ij Is a filtering weight; s is the total number of pixels in the window; i is the number of pixels accumulated at present;

22 Adaptive histogram equalization defining contrast;

H _m,n (r)，0≤r≤K-1；

confirming the shearing limit value beta:

23 Laplace image sharpening;

carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (u, v), the image is subjected to Laplace operator treatment:

wherein k (mz, nz) is a 3×3 laplacian mask; p (u, v) is the gray value of the original image, and L (u, v) is the image processed by the Laplacian operator; mz is the grid center pixel abscissa; nz is the ordinate of the center pixel of the nine-grid; u is the abscissa of the selected midpoint; v is the ordinate of the selected point.

2. The surface point cloud reconstruction method based on binocular stereoscopic vision according to claim 1, wherein the specific method of the third step is as follows:

32 For T) _B Initializing background pixel n in the image, and marking n as alpha _n =0; for T _u Initializing a pixel nt in a target pixel, wherein the label of the nt is alpha _N ＝1；

33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary mode, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K types through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of Gaussian components to the total number of the pixels; the initialization process ends so far;

34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel nt into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k _nt ：

Wherein D is _nt The energy data corresponding to the pixel nt; alpha _nt An opacity index value corresponding to pixel nt; θ is a gray histogram of a target or background region of the image; z is Z _nt A gray value corresponding to the pixel nt;

3. The surface point cloud reconstruction method based on binocular stereoscopic vision according to claim 1, wherein the specific method of the fourth step is as follows:

re(i,j)＝|I _L (i,j)-I _R (i+d _ij ,j)|；

wherein I is _L Is a left image; i _R Is the right image; d, d _ij Is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.