CN113421210B - Surface point Yun Chong construction method based on binocular stereoscopic vision - Google Patents
Surface point Yun Chong construction method based on binocular stereoscopic vision Download PDFInfo
- Publication number
- CN113421210B CN113421210B CN202110821716.8A CN202110821716A CN113421210B CN 113421210 B CN113421210 B CN 113421210B CN 202110821716 A CN202110821716 A CN 202110821716A CN 113421210 B CN113421210 B CN 113421210B
- Authority
- CN
- China
- Prior art keywords
- image
- pixel
- pixels
- value
- disparity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title abstract description 6
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 7
- 238000012937 correction Methods 0.000 claims abstract description 6
- 238000003709 image segmentation Methods 0.000 claims abstract description 4
- 239000000203 mixture Substances 0.000 claims description 27
- 238000001914 filtration Methods 0.000 claims description 24
- 238000005457 optimization Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 14
- 230000002146 bilateral effect Effects 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 12
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000003707 image sharpening Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 4
- 230000008033 biological extinction Effects 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000010008 shearing Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration by the use of histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06T5/73—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20028—Bilateral filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20032—Median filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention belongs to the field of digital image processing, and particularly relates to a surface point Yun Chong construction method based on binocular stereoscopic vision. Comprising the following steps: step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling; step two, preprocessing the corrected image; step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm; recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image; and fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four. The method solves the problems of low reconstruction precision, low speed, poor mobility and the like through the procedures of three-dimensional correction, image preprocessing, interested region background removal, three-dimensional matching, point cloud reconstruction and the like.
Description
Technical Field
The invention belongs to the field of digital image processing, and particularly relates to a surface point Yun Chong construction method based on binocular stereoscopic vision.
Background
In recent years, with the gradual improvement of the automation level of the manufacturing industry and the continuous upgrade of the scientific transformation of enterprises, the machine vision technology is increasingly applied to the industrial production, and the binocular stereo vision technology is taken as a passive and non-contact measurement means, so that the rapid measurement speed and reasonable price are favored by the market under the wide use condition.
The surface point cloud reconstruction technology based on binocular stereoscopic vision can be applied to the fields of part identification and positioning, unmanned aerial vehicle autonomous navigation, satellite remote sensing mapping, 3D model reconstruction and the like, is a research hotspot and difficulty in the artificial intelligence direction at the present stage, and has quite wide application prospects.
Through summarizing the prior research results, the existing surface point Yun Chong construction method based on binocular stereoscopic vision is gradually perfected, but the following key problems are solved:
1) The existing preprocessing method cannot give consideration to denoising effect and image characteristic detail retention during image filtering and enhancing, and is easy to cause image blurring and edge deletion, so that point cloud defects are caused;
2) The existing surface point Yun Chong building method aims at the global image to perform point cloud recovery, lacks directivity, easily causes resource waste, reduces calculation efficiency and causes mismatching;
3) Existing neural network-based stereo matching methods mostly calculate matching costs through a single scale without a disparity refinement step, or use a traditional old disparity optimization method. Which tends to cause disparity map discontinuities.
Disclosure of Invention
The invention provides a surface point Yun Chong construction method based on binocular stereo vision, which solves the problems of low reconstruction precision, low speed, poor mobility and the like through the processes of stereo correction, image preprocessing, interested area background removal, stereo matching, point cloud reconstruction and the like.
The technical scheme of the invention is as follows in combination with the accompanying drawings:
a surface point cloud reconstruction method based on binocular stereoscopic vision comprises the following steps:
step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;
preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening;
step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image;
and fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four.
The specific method of the second step is as follows:
21 Weighted median filtering with bilateral filtering as weight;
performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein,to adjust the space; />Is color similarity; k (k) i Is a regularization factor; i-j 2 And |i i -j j | 2 Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e i Is the abscissa of the adjacent pixel; j (j) j Is the ordinate of the adjacent pixel;
selecting window R i When the size is (2r+1) × (2r+1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated i A pair of random sequences { I (I), w i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i * Is a new pixel value for the center point of the local window, as shown in the following equation:
wherein i is * Is the filtered parallax value; l is the pixel value of the window center point; w (w) ij Is a filtering weight; n is the total number of pixels in the windowThe method comprises the steps of carrying out a first treatment on the surface of the I is the number of pixels accumulated at present;
22 Adaptive histogram equalization defining contrast;
performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and denoised image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of the gray levels of the histograms which possibly occur as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:
H m,n (r),0≤r≤K-1;
where r is the gray level of each sub-region; k is the number of gray levels of the histogram;
confirming the shearing limit value beta:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;
23 Laplace image sharpening;
carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (i, j), the image subjected to Laplace operator treatment is:
wherein k (m, n) is a 3×3 laplacian mask; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplacian; m is the horizontal coordinate of the center pixel of the nine-grid; n is the ordinate of the center pixel of the nine-grid; i is the abscissa of the selected point; j is the ordinate of the selected point.
The specific method of the third step is as follows:
31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T u The other pixels are defined as background pixels T B ;
32 For T) B Initializing background pixel n in the image, and marking n as alpha n =0; for T u Initializing a pixel n in the target pixel, and marking the label of n as alpha N =1;
33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary way, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K classes through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of the Gaussian components to the total number of the pixels; the initialization process ends so far;
34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k n :
Wherein D is n The energy data corresponding to the pixel n; alpha n An opacity index value corresponding to pixel n; θ is a gray histogram of a target or background region of the image; z is Z n A gray value corresponding to the pixel n;
35 Further learning optimization of the gaussian mixture model from the given image data z:
wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
36 Gibbs energy term D) analyzed by step 34) n Obtaining Gibbs energy weight 1/k n The segmentation is then estimated by a min-max flow algorithm:
wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;
38 Smoothing the segmentation result by adopting a boundary extinction mechanism.
The specific method of the fourth step is as follows:
41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; the features of the first two layers are up-sampled to the original resolution and fused by a 1 x 1 convolution layer, the step length is 1, and the features are used for calculating reconstruction errors; compressing the features of the first layer using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate the correlation in the parallax optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be applied to both a disparity estimation network, namely DES-net, and a disparity optimization network, namely DRS-net;
42 The input to the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network, DES-net, is used to directly regress the initial disparity;
43 The parallax optimization network, i.e. DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:
wherein I is L Is a left image; i R Is the right image;is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
The beneficial effects of the invention are as follows:
1) The method has strong robustness to illumination change, and the obtained point cloud model is complete: a three-step image preprocessing method is disclosed, which uses weighted median filtering with bilateral filtering as weight, adaptive histogram equalization with limited contrast, laplacian image sharpening, and maintains the edge and characteristic information while ensuring the denoising effect
2) The invention has high reconstruction speed and high precision, only considers the reconstruction object as the region of interest to remove the complex background, saves the calculation resources and reduces the mismatching probability caused by similar pixels of the background region;
3) The invention has accurate matching effect and smooth parallax map. The improved Convolutional Neural Network (CNNs) consists of a shared feature extraction network, a parallax estimation network (DES-net) parallax optimization network (DRS-net) and a parallax optimization network (DRS-net), and solves the defects that only matching cost under a single scale is calculated and a parallax optimization link is not available in a conventional neural network method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of a second step of the present invention;
FIG. 3 is a block diagram of a convolutional neural network of the present invention;
FIG. 4 is a flow chart of multi-scale feature extraction.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a surface point cloud reconstruction method based on binocular stereoscopic vision includes the following steps:
step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;
preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening; the method comprises the following steps:
with reference to figure 2 of the drawings,
21 Weighted median filtering with bilateral filtering as weight;
performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein,to adjust the space; />Is color similarity; k (k) i Is a regularization factor; i-j 2 And |i i -j j | 2 Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e i Is the abscissa of the adjacent pixel; j (j) j Is the ordinate of the adjacent pixel;
selecting window R i When the size is (2r+1) × (2r+1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated i A pair of random sequences { I (I), w i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i * Is a new pixel value for the center point of the local window, as shown in the following equation:
wherein i is * Is the filtered parallax value; l is the pixel value of the window center point; w (w) ij Is a filtering weight; n is the total number of pixels within the window; i is the number of pixels accumulated at present;
22 Adaptive histogram equalization;
performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and de-dried image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of possible gray levels of the histograms as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:
H m,n (r),0≤r≤K-1;
where r is the gray level of each sub-region; k is the number of gray levels of the histogram;
confirming the shearing limit value beta:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;
the clipping limiting value beta is set to clip the pixels beyond the limiting part, so that the purpose of limiting the contrast is achieved.
23 Laplace image sharpening;
carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (i, j), the image subjected to Laplace operator treatment is:
wherein k (m, n) is a 3×3 laplacian mask; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplacian; m is the horizontal coordinate of the center pixel of the nine-grid; n is the ordinate of the center pixel of the nine-grid; i is the abscissa of the selected point; j is the ordinate of the selected midpoint;
step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;
31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T u The other pixels are defined as background pixels T B ;
The region of interest is defined by the user himself.
32 For T) B Initializing background pixel n in the image, and marking n as alpha n =0; for T u Initializing a pixel n in the target pixel, and marking the label of n as alpha N =1;
33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary way, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K classes through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of the Gaussian components to the total number of the pixels; the initialization process ends so far;
34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k n :
Wherein D is n The energy data corresponding to the pixel n; alpha n An opacity index value corresponding to pixel n; θ is a gray histogram of a target or background region of the image; z is Z n A gray value corresponding to the pixel n;
35 Further learning optimization of the gaussian mixture model from the given image data z:
wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
36 Gibbs energy term D) analyzed by step 34) n Obtaining Gibbs energy weight 1/k n The segmentation is then estimated by a min-max flow algorithm:
wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;
38 Smoothing the segmentation result by adopting a boundary extinction mechanism.
Step four, referring to fig. 3, recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image; it includes a shared feature extraction module, a disparity estimation network (DES-net), a disparity optimization network (DRS-net). The shared feature extraction network uses a connected network of shallow codec structures to extract common multi-scale features from left and right images. Part of these features are used to calculate matching cost values (i.e., correlations) for the disparity estimation network (DES-net) and the disparity optimization network (DRS-net). The features of the first layer are further compressed to produce c_conv1a and c_conv1b using a 1 x 1 convolution. These shared features are also used to calculate the reconstruction errors of the parallax optimized network (DRS-net);
41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; referring to fig. 4, features of the first two layers are up-sampled to the original resolution and fused by 1 x 1 convolution layers, with a step size of 1, and features with a relatively large receptive field and different levels of abstraction are obtained by the last deconvolution layer and the first convolution layer for calculating the reconstruction error, wherein. "Conv2a" represents the second convolution layer sharing the feature extraction module. The features of the first layer are compressed using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate correlations in a parallax optimized network (DRS-net). The features generated by the shared feature extraction module may be applied in both a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);
42 The input of the disparity estimation network (DES-net) comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; a disparity estimation network (DES-net) is used to directly regress the initial disparity;
43 A parallax optimization network (DRS-net) uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:
wherein I is L Is a left image; i R Is the right image;is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel and j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
And fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four.
The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the scope of the present invention is not limited to the specific details of the above embodiments, and within the scope of the technical concept of the present invention, any person skilled in the art may apply equivalent substitutions or alterations to the technical solution according to the present invention and the inventive concept thereof within the scope of the technical concept of the present invention, and these simple modifications are all within the scope of the present invention.
In addition, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further.
Moreover, any combination of the various embodiments of the invention can be made without departing from the spirit of the invention, which should also be considered as disclosed herein.
Claims (3)
1. The surface point cloud reconstruction method based on binocular stereoscopic vision is characterized by comprising the following steps of:
step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;
preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening;
step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image;
step five, reconstructing the surface point cloud according to the parallax map obtained in the step four;
the specific method of the second step is as follows:
21 Weighted median filtering with bilateral filtering as weight;
performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein,to adjust the space; />Is color similarity; k (k) i Is a regularization factor; i-j 2 And |i i -j j | 2 Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e i Is the abscissa of the adjacent pixel; j (j) j Is the ordinate of the adjacent pixel;
selecting window R i When the size is (2r+1) × (2r+1), wherein R is the radius of the window, the number of pixels contained in the window is s, and the window R is calculated i A pair of random sequences { I (I), w i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i * New pixel values that are the center points of the local windows; the following formula is shown:
wherein i is * Is the filtered parallax value; l is the pixel value of the window center point; w (w) ij Is a filtering weight; s is the total number of pixels in the window; i is the number of pixels accumulated at present;
22 Adaptive histogram equalization defining contrast;
performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and denoised image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of the gray levels of the histograms which possibly occur as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:
H m,n (r),0≤r≤K-1;
where r is the gray level of each sub-region; k is the number of gray levels of the histogram;
confirming the shearing limit value beta:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;
23 Laplace image sharpening;
carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (u, v), the image is subjected to Laplace operator treatment:
wherein k (mz, nz) is a 3×3 laplacian mask; p (u, v) is the gray value of the original image, and L (u, v) is the image processed by the Laplacian operator; mz is the grid center pixel abscissa; nz is the ordinate of the center pixel of the nine-grid; u is the abscissa of the selected midpoint; v is the ordinate of the selected point.
2. The surface point cloud reconstruction method based on binocular stereoscopic vision according to claim 1, wherein the specific method of the third step is as follows:
31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T u The other pixels are defined as background pixels T B ;
32 For T) B Initializing background pixel n in the image, and marking n as alpha n =0; for T u Initializing a pixel nt in a target pixel, wherein the label of the nt is alpha N =1;
33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary mode, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K types through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of Gaussian components to the total number of the pixels; the initialization process ends so far;
34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel nt into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k nt :
Wherein D is nt The energy data corresponding to the pixel nt; alpha nt An opacity index value corresponding to pixel nt; θ is a gray histogram of a target or background region of the image; z is Z nt A gray value corresponding to the pixel nt;
35 Further learning optimization of the gaussian mixture model from the given image data z:
wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
36 Gibbs energy term D) analyzed by step 34) n Obtaining Gibbs energy weight 1/k n The segmentation is then estimated by a min-max flow algorithm:
wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;
38 Smoothing the segmentation result by adopting a boundary extinction mechanism.
3. The surface point cloud reconstruction method based on binocular stereoscopic vision according to claim 1, wherein the specific method of the fourth step is as follows:
41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; the features of the first two layers are up-sampled to the original resolution and fused by a 1 x 1 convolution layer, the step length is 1, and the features are used for calculating reconstruction errors; compressing the features of the first layer using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate the correlation in the parallax optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be applied to both a disparity estimation network, namely DES-net, and a disparity optimization network, namely DRS-net;
42 The input to the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network, DES-net, is used to directly regress the initial disparity;
43 The parallax optimization network, i.e. DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:
re(i,j)=|I L (i,j)-I R (i+d ij ,j)|;
wherein I is L Is a left image; i R Is the right image; d, d ij Is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110821716.8A CN113421210B (en) | 2021-07-21 | 2021-07-21 | Surface point Yun Chong construction method based on binocular stereoscopic vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110821716.8A CN113421210B (en) | 2021-07-21 | 2021-07-21 | Surface point Yun Chong construction method based on binocular stereoscopic vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421210A CN113421210A (en) | 2021-09-21 |
CN113421210B true CN113421210B (en) | 2024-04-12 |
Family
ID=77721554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110821716.8A Active CN113421210B (en) | 2021-07-21 | 2021-07-21 | Surface point Yun Chong construction method based on binocular stereoscopic vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421210B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115695393B (en) * | 2022-12-28 | 2023-03-21 | 山东矩阵软件工程股份有限公司 | Format conversion method, system and storage medium for radar point cloud data |
CN116630761A (en) * | 2023-06-16 | 2023-08-22 | 中国人民解放军61540部队 | Digital surface model fusion method and system for multi-view satellite images |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080052363A (en) * | 2006-12-05 | 2008-06-11 | 한국전자통신연구원 | Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation |
CN104867135A (en) * | 2015-05-04 | 2015-08-26 | 中国科学院上海微系统与信息技术研究所 | High-precision stereo matching method based on guiding image guidance |
CN104978722A (en) * | 2015-07-06 | 2015-10-14 | 天津大学 | Multi-exposure image fusion ghosting removing method based on background modeling |
CN112288689A (en) * | 2020-10-09 | 2021-01-29 | 浙江未来技术研究院(嘉兴) | Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process |
-
2021
- 2021-07-21 CN CN202110821716.8A patent/CN113421210B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080052363A (en) * | 2006-12-05 | 2008-06-11 | 한국전자통신연구원 | Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation |
CN104867135A (en) * | 2015-05-04 | 2015-08-26 | 中国科学院上海微系统与信息技术研究所 | High-precision stereo matching method based on guiding image guidance |
CN104978722A (en) * | 2015-07-06 | 2015-10-14 | 天津大学 | Multi-exposure image fusion ghosting removing method based on background modeling |
CN112288689A (en) * | 2020-10-09 | 2021-01-29 | 浙江未来技术研究院(嘉兴) | Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process |
Non-Patent Citations (1)
Title |
---|
《基于双目立体视觉的人脸三维重建关键技术研究》;祁乐阳;《优秀硕士论文》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113421210A (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110738697B (en) | Monocular depth estimation method based on deep learning | |
CN108921799B (en) | Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network | |
CN110866924B (en) | Line structured light center line extraction method and storage medium | |
CN108765325B (en) | Small unmanned aerial vehicle blurred image restoration method | |
CN112819772B (en) | High-precision rapid pattern detection and recognition method | |
CN111160407B (en) | Deep learning target detection method and system | |
CN109685045B (en) | Moving target video tracking method and system | |
CN113421210B (en) | Surface point Yun Chong construction method based on binocular stereoscopic vision | |
CN114529459B (en) | Method, system and medium for enhancing image edge | |
CN110136075B (en) | Remote sensing image defogging method for generating countermeasure network based on edge sharpening cycle | |
CN111310508B (en) | Two-dimensional code identification method | |
Pei et al. | Effects of image degradations to cnn-based image classification | |
CN113052755A (en) | High-resolution image intelligent matting method based on deep learning | |
CN114283162A (en) | Real scene image segmentation method based on contrast self-supervision learning | |
CN111681198A (en) | Morphological attribute filtering multimode fusion imaging method, system and medium | |
CN116310095A (en) | Multi-view three-dimensional reconstruction method based on deep learning | |
CN113160278A (en) | Scene flow estimation and training method and device of scene flow estimation model | |
CN109241981B (en) | Feature detection method based on sparse coding | |
CN110175972B (en) | Infrared image enhancement method based on transmission map fusion | |
CN111612802A (en) | Re-optimization training method based on existing image semantic segmentation model and application | |
CN111627033B (en) | Method, equipment and computer readable storage medium for dividing difficult sample instance | |
CN110264417B (en) | Local motion fuzzy area automatic detection and extraction method based on hierarchical model | |
CN110490877B (en) | Target segmentation method for binocular stereo image based on Graph Cuts | |
Zhang et al. | Single image haze removal for aqueous vapour regions based on optimal correction of dark channel | |
CN109359654B (en) | Image segmentation method and system based on frequency tuning global saliency and deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20231017 Address after: No. 2055, Yan'an street, Changchun City, Jilin Province Applicant after: Changchun University of Technology Address before: 523000 room 222, building 1, No. 1, Kehui Road, Dongguan City, Guangdong Province Applicant before: Dongguan Zhongke Sanwei fish Intelligent Technology Co.,Ltd. Applicant before: Changchun University of Technology |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |