CN113421210A - Surface point cloud reconstruction method based on binocular stereo vision - Google Patents
Surface point cloud reconstruction method based on binocular stereo vision Download PDFInfo
- Publication number
- CN113421210A CN113421210A CN202110821716.8A CN202110821716A CN113421210A CN 113421210 A CN113421210 A CN 113421210A CN 202110821716 A CN202110821716 A CN 202110821716A CN 113421210 A CN113421210 A CN 113421210A
- Authority
- CN
- China
- Prior art keywords
- pixel
- image
- pixels
- disparity
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 8
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 7
- 238000012937 correction Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000003709 image segmentation Methods 0.000 claims abstract description 4
- 238000001914 filtration Methods 0.000 claims description 24
- 239000000203 mixture Substances 0.000 claims description 24
- 238000005457 optimization Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 14
- 230000002146 bilateral effect Effects 0.000 claims description 13
- 230000003044 adaptive effect Effects 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 6
- 238000003707 image sharpening Methods 0.000 claims description 4
- 230000008033 biological extinction Effects 0.000 claims description 3
- 238000012790 confirmation Methods 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration by the use of histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06T5/73—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20028—Bilateral filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20032—Median filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention belongs to the field of digital image processing, and particularly relates to a surface point cloud reconstruction method based on binocular stereo vision. The method comprises the following steps: the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name; secondly, preprocessing the corrected image; step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm; recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map; and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four. The method solves the problems of low reconstruction precision, low speed, poor migratability and the like through the processes of stereo correction, image preprocessing, background removal of the region of interest, stereo matching, point cloud reconstruction and the like.
Description
Technical Field
The invention belongs to the field of digital image processing, and particularly relates to a surface point cloud reconstruction method based on binocular stereo vision.
Background
In recent years, with the increasing level of automation in manufacturing industry and the increasing level of technology transformation in enterprises, machine vision technology is increasingly applied in industrial production, and binocular stereo vision technology is used as a passive, non-contact measuring means, and is favored by the market with a wide range of use conditions, a faster measuring speed and a reasonable price, and is not enjoyed.
The surface point cloud reconstruction technology based on binocular stereo vision can be applied to the fields of part identification and positioning, unmanned aerial vehicle autonomous navigation, satellite remote sensing surveying and mapping, 3D model reconstruction and the like, is a research hotspot and difficulty of artificial intelligence direction at the present stage, and has quite wide application prospect.
After the results of the existing research are summarized, although the existing surface point cloud reconstruction method based on binocular stereo vision is gradually improved, the method is not satisfactory when the following key problems are solved:
1) the existing preprocessing method can not give consideration to the denoising effect and the image characteristic detail retention during image filtering and enhancing, and is easy to cause image blurring and edge deletion, and point cloud defect;
2) the existing surface point cloud reconstruction method carries out point cloud recovery on a global image, lacks directivity, easily causes resource waste, reduces calculation efficiency and causes mismatching;
3) existing neural network-based stereo matching methods mostly calculate matching costs through a single scale without a disparity refinement step or use a traditional old disparity optimization method. The disparity map is likely to be discontinuous.
Disclosure of Invention
The invention provides a surface point cloud reconstruction method based on binocular stereo vision, which solves the problems of low reconstruction precision, low speed, poor migration and the like through the processes of stereo correction, image preprocessing, background removal of an interested region, stereo matching, point cloud reconstruction and the like.
The technical scheme of the invention is described as follows by combining the attached drawings:
a surface point cloud reconstruction method based on binocular stereo vision comprises the following steps:
the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;
secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening;
step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map;
and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.
The specific method of the second step is as follows:
21) weighted median filtering with bilateral filtering as weight;
performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein the content of the first and second substances,to adjust the space;is the color similarity; k is a radical ofiIs a regularization factor; i-j non-woven2And | ii-jj|2Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. iiIs the abscissa of the adjacent pixel; j is a function ofjIs the ordinate of the adjacent pixel;
when selecting window RiWhen the size is (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculatediOne pair of random sequences { I (i), wi,jThe pixel values and weights of the z, then the weights are sorted in turn until the cumulative weight is greater than half the weight, at which point the corresponding i*Is the new pixel value at the center point of the local window, as shown in the following equation:
wherein i*Is a filtered disparity value; l is the pixel value of the center point of the window; w is aijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;
22) adaptive histogram equalization to define contrast;
carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and denoised image, namely M pixels multiplied by N pixels into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of the gray levels of the histograms which possibly appear as K, and the gray level of each subregion as r, wherein the histogram function corresponding to the region (M, N) is as follows:
Hm,n(r),0≤r≤K-1;
wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;
confirmation of shear clipping magnitude β:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;
23) sharpening the Laplace image;
performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:
wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point.
The concrete method of the third step is as follows:
31) the interested area is selected by user interaction, and the pixel in the frame is defined as TuThe other pixels are defined as background pixels TB;
32) For TBThe background pixel n in (1) is initialized, and the label of n is alphan0; for TuInitializing a pixel n in the target pixel, wherein the label of the n is alphaN=1;
33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;
34) assigning a Gaussian in a Gaussian mixture model to each pixelSubstituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the highest probability as kn:
Wherein D isnEnergy data corresponding to the pixel n; alpha is alphanLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; znThe gray value corresponding to the pixel n;
35) and further performing learning optimization on the Gaussian mixture model according to given image data z:
wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
36) gibbs energy term D analyzed by step 34)nCalculating Gibbs energy weight 1/knThe segmentation is then estimated by the min-segmentation-max flow algorithm:
wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;
38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.
The concrete method of the fourth step is as follows:
41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; the features of the first two layers are up-sampled to the original resolution and fused by a 1 × 1 convolutional layer, with a step length of 1, for calculating the reconstruction error; compressing features of the first layer using a 1 × 1 convolutional layer with a step of 1, which will be used to compute dependencies in the disparity optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be simultaneously applied to a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);
42) the input of the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network DES-net is used for directly regressing the initial disparity;
43) the parallax optimization network, namely the DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which can reflect the correctness of the estimated parallax, and the reconstruction error is calculated as:
wherein, ILIs a left image; i isRIs a right image;is the estimated disparity at location (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
The invention has the beneficial effects that:
1) the method has strong robustness to illumination change, and the obtained point cloud model is complete: a three-step image preprocessing method is disclosed, which adopts weighted median filtering with bilateral filtering as weight, adaptive histogram equalization for limiting contrast, and Laplacian image sharpening, and retains edge and characteristic information while ensuring denoising effect
2) The method has high reconstruction speed and high precision, and only takes the reconstructed object as the region of interest to remove the complex background of the region of interest, thereby saving the calculation resources and reducing the mismatching probability caused by similar pixels in the background region;
3) the invention has accurate matching effect and smooth parallax image. The defect that only the matching cost under a single scale is calculated and a parallax optimization link is not provided in a conventional neural network method is overcome by the improved Convolutional Neural Networks (CNNs) which are composed of a shared feature extraction network, a parallax estimation network (DES-net) parallax optimization network (DRS-net) and a parallax optimization network (DRS-net).
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of step two of the present invention;
FIG. 3 is a block diagram of a convolutional neural network of the present invention;
fig. 4 is a flow chart of multi-scale feature extraction.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a surface point cloud reconstruction method based on binocular stereo vision includes the following steps:
the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;
secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening; the method comprises the following specific steps:
with reference to figure 2 of the drawings,
21) weighted median filtering with bilateral filtering as weight;
performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein the content of the first and second substances,to adjust the space;is the color similarity; k is a radical ofiIs a regularization factor; i-j non-woven2And | ii-jj|2Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. iiIs the abscissa of the adjacent pixel; j is a function ofjIs the ordinate of the adjacent pixel;
when selecting window RiWhen the size is (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculatediOne pair of random sequences { I (i), wi,jPixel value of } andweights are then sequentially ordered until the cumulative weight is greater than half the weight, at which point the corresponding i*Is the new pixel value at the center point of the local window, as shown in the following equation:
wherein i*Is a filtered disparity value; l is the pixel value of the center point of the window; w is aijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;
22) self-adaptive histogram equalization;
carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and dehumidified image, namely M pixels multiplied by N pixels, into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of possible histogram gray levels as K, and the gray level of each subregion as r, and then the histogram function corresponding to the region (M, N) is as follows:
Hm,n(r),0≤r≤K-1;
wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;
confirmation of shear clipping magnitude β:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;
setting the clipping limiting value beta can clip the pixels beyond the limited part, thereby achieving the purpose of limiting the contrast.
23) Sharpening the Laplace image;
performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:
wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point;
step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;
31) the interested area is selected by user interaction, and the pixel in the frame is defined as TuThe other pixels are defined as background pixels TB;
The region of interest is defined at the discretion of the user.
32) For TBThe background pixel n in (1) is initialized, and the label of n is alphan0; for TuInitializing a pixel n in the target pixel, wherein the label of the n is alphaN=1;
33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;
34) assigning a Gaussian component in the Gaussian mixture model to each pixel to obtain a targetSubstituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the maximum probability as kn:
Wherein D isnEnergy data corresponding to the pixel n; alpha is alphanLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; znThe gray value corresponding to the pixel n;
35) and further performing learning optimization on the Gaussian mixture model according to given image data z:
wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
36) gibbs energy term D analyzed by step 34)nCalculating Gibbs energy weight 1/knThe segmentation is then estimated by the min-segmentation-max flow algorithm:
wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;
38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.
Fourthly, with reference to fig. 3, recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map; the system comprises a shared feature extraction module, a disparity estimation network (DES-net) and a disparity optimization network (DRS-net). The shared feature extraction network uses a connected network of shallow codec structures to extract common multi-scale features from left and right images. Some of these features are used to compute the matching cost values (i.e., correlations) of the disparity estimation network (DES-net) and the disparity optimization network (DRS-net). The features of the first layer are further compressed to produce c _ conv1a and c _ conv1b using a 1 × 1 convolution. These shared features are also used to calculate the reconstruction error of the disparity optimization network (DRS-net);
41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; referring to fig. 4, the features of the first two layers are upsampled to the original resolution and fused by 1 × 1 convolutional layer with step size of 1, and features with relatively large receptive field and different abstraction levels are obtained by the last deconvolution layer and the first convolutional layer for calculating the reconstruction error. "Conv 2 a" represents the second convolutional layer that shares the feature extraction module. The features of the first layer are compressed using a 1 x 1 convolutional layer with a step size of 1, which will be used to compute the correlation in the disparity optimized network (DRS-net). The features generated by the shared feature extraction module can be applied in a disparity estimation network (DES-net) and a disparity optimization network (DRS-net) at the same time;
42) the input of the disparity estimation network (DES-net) comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; a disparity estimation network (DES-net) for directly regressing the initial disparity;
43) the disparity optimization network (DRS-net) uses the shared features and the initial disparity to calculate a reconstruction error re, which may reflect the correctness of the estimated disparity, which is calculated as:
wherein, ILIs a left image; i isRIs a right image;is the estimated disparity at location (i, j); i is the abscissa of the selected position pixel, and j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
And step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.
Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the scope of the present invention is not limited to the specific details of the above embodiments, and any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and these simple modifications belong to the scope of the present invention.
It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.
In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.
Claims (4)
1. A surface point cloud reconstruction method based on binocular stereo vision is characterized by comprising the following steps:
the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;
secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening;
step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map;
and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.
2. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the second step is as follows:
21) weighted median filtering with bilateral filtering as weight;
performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein the content of the first and second substances,to adjust the space;is the color similarity; k is a radical ofiIs a regularization factor; i-j non-woven2And | ii-jj|2Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. iiIs the abscissa of the adjacent pixel; j is a function ofjIs the ordinate of the adjacent pixel;
when selecting window RiSize and breadthIs (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculatediOne pair of random sequences { I (i), wi,jThe pixel values and weights of the z, then the weights are sorted in turn until the cumulative weight is greater than half the weight, at which point the corresponding i*Is the new pixel value of the local window center point; as shown in the following formula:
wherein i*Is a filtered disparity value; l is the pixel value of the center point of the window; w is aijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;
22) adaptive histogram equalization to define contrast;
carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and denoised image, namely M pixels multiplied by N pixels into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of the gray levels of the histograms which possibly appear as K, and the gray level of each subregion as r, wherein the histogram function corresponding to the region (M, N) is as follows:
Hm,n(r),0≤r≤K-1;
wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;
confirmation of shear clipping magnitude β:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;
23) sharpening the Laplace image;
performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:
wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point.
3. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the third step is as follows:
31) the interested area is selected by user interaction, and the pixel in the frame is defined as TuThe other pixels are defined as background pixels TB;
32) For TBThe background pixel n in (1) is initialized, and the label of n is alphan0; for TuInitializing a pixel n in the target pixel, wherein the label of the n is alphaN=1;
33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;
34) distributing Gaussian components in the Gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the maximum probability to be recorded as kn:
Wherein D isnEnergy data corresponding to the pixel n; alpha is alphanLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; znThe gray value corresponding to the pixel n;
35) and further performing learning optimization on the Gaussian mixture model according to given image data z:
wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
36) gibbs energy term D analyzed by step 34)nCalculating Gibbs energy weight 1/knThe segmentation is then estimated by the min-segmentation-max flow algorithm:
wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;
38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.
4. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the fourth step is as follows:
41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; the features of the first two layers are up-sampled to the original resolution and fused by a 1 × 1 convolutional layer, with a step length of 1, for calculating the reconstruction error; compressing features of the first layer using a 1 × 1 convolutional layer with a step of 1, which will be used to compute the correlation in the disparity-optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be simultaneously applied to a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);
42) the input of the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network DES-net is used for directly regressing the initial disparity;
43) the parallax optimization network, namely the DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which can reflect the correctness of the estimated parallax, and the reconstruction error is calculated as:
wherein, ILIs a left image; i isRIs a right image;is the estimated disparity at location (i, j)(ii) a i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110821716.8A CN113421210B (en) | 2021-07-21 | 2021-07-21 | Surface point Yun Chong construction method based on binocular stereoscopic vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110821716.8A CN113421210B (en) | 2021-07-21 | 2021-07-21 | Surface point Yun Chong construction method based on binocular stereoscopic vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421210A true CN113421210A (en) | 2021-09-21 |
CN113421210B CN113421210B (en) | 2024-04-12 |
Family
ID=77721554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110821716.8A Active CN113421210B (en) | 2021-07-21 | 2021-07-21 | Surface point Yun Chong construction method based on binocular stereoscopic vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421210B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115695393A (en) * | 2022-12-28 | 2023-02-03 | 山东矩阵软件工程股份有限公司 | Format conversion method, system and storage medium for radar point cloud data |
CN116630761A (en) * | 2023-06-16 | 2023-08-22 | 中国人民解放军61540部队 | Digital surface model fusion method and system for multi-view satellite images |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080052363A (en) * | 2006-12-05 | 2008-06-11 | 한국전자통신연구원 | Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation |
CN104867135A (en) * | 2015-05-04 | 2015-08-26 | 中国科学院上海微系统与信息技术研究所 | High-precision stereo matching method based on guiding image guidance |
CN104978722A (en) * | 2015-07-06 | 2015-10-14 | 天津大学 | Multi-exposure image fusion ghosting removing method based on background modeling |
CN112288689A (en) * | 2020-10-09 | 2021-01-29 | 浙江未来技术研究院(嘉兴) | Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process |
-
2021
- 2021-07-21 CN CN202110821716.8A patent/CN113421210B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080052363A (en) * | 2006-12-05 | 2008-06-11 | 한국전자통신연구원 | Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation |
CN104867135A (en) * | 2015-05-04 | 2015-08-26 | 中国科学院上海微系统与信息技术研究所 | High-precision stereo matching method based on guiding image guidance |
CN104978722A (en) * | 2015-07-06 | 2015-10-14 | 天津大学 | Multi-exposure image fusion ghosting removing method based on background modeling |
CN112288689A (en) * | 2020-10-09 | 2021-01-29 | 浙江未来技术研究院(嘉兴) | Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process |
Non-Patent Citations (1)
Title |
---|
祁乐阳: "《基于双目立体视觉的人脸三维重建关键技术研究》", 《优秀硕士论文》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115695393A (en) * | 2022-12-28 | 2023-02-03 | 山东矩阵软件工程股份有限公司 | Format conversion method, system and storage medium for radar point cloud data |
CN116630761A (en) * | 2023-06-16 | 2023-08-22 | 中国人民解放军61540部队 | Digital surface model fusion method and system for multi-view satellite images |
Also Published As
Publication number | Publication date |
---|---|
CN113421210B (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108921799B (en) | Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network | |
Engin et al. | Cycle-dehaze: Enhanced cyclegan for single image dehazing | |
Fu et al. | Removing rain from single images via a deep detail network | |
CN108765325B (en) | Small unmanned aerial vehicle blurred image restoration method | |
CN108230264B (en) | Single image defogging method based on ResNet neural network | |
CN111915530B (en) | End-to-end-based haze concentration self-adaptive neural network image defogging method | |
CN112232349A (en) | Model training method, image segmentation method and device | |
CN109584282B (en) | Non-rigid image registration method based on SIFT (scale invariant feature transform) features and optical flow model | |
CN111899295B (en) | Monocular scene depth prediction method based on deep learning | |
CN110136075B (en) | Remote sensing image defogging method for generating countermeasure network based on edge sharpening cycle | |
CN110570440A (en) | Image automatic segmentation method and device based on deep learning edge detection | |
CN107506792B (en) | Semi-supervised salient object detection method | |
CN113421210B (en) | Surface point Yun Chong construction method based on binocular stereoscopic vision | |
CN111681198A (en) | Morphological attribute filtering multimode fusion imaging method, system and medium | |
CN114283162A (en) | Real scene image segmentation method based on contrast self-supervision learning | |
CN116310095A (en) | Multi-view three-dimensional reconstruction method based on deep learning | |
Yu et al. | Split-attention multiframe alignment network for image restoration | |
CN116310098A (en) | Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network | |
Babu et al. | An efficient image dahazing using Googlenet based convolution neural networks | |
CN113627481A (en) | Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens | |
Zhang et al. | A new image filtering method: Nonlocal image guided averaging | |
CN110264417B (en) | Local motion fuzzy area automatic detection and extraction method based on hierarchical model | |
CN110490877B (en) | Target segmentation method for binocular stereo image based on Graph Cuts | |
CN113837243A (en) | RGB-D camera dynamic visual odometer method based on edge information | |
Zhang et al. | Single image haze removal for aqueous vapour regions based on optimal correction of dark channel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20231017 Address after: No. 2055, Yan'an street, Changchun City, Jilin Province Applicant after: Changchun University of Technology Address before: 523000 room 222, building 1, No. 1, Kehui Road, Dongguan City, Guangdong Province Applicant before: Dongguan Zhongke Sanwei fish Intelligent Technology Co.,Ltd. Applicant before: Changchun University of Technology |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |