CN113421210A - Surface point cloud reconstruction method based on binocular stereo vision - Google Patents

Surface point cloud reconstruction method based on binocular stereo vision Download PDF

Info

Publication number
CN113421210A
CN113421210A CN202110821716.8A CN202110821716A CN113421210A CN 113421210 A CN113421210 A CN 113421210A CN 202110821716 A CN202110821716 A CN 202110821716A CN 113421210 A CN113421210 A CN 113421210A
Authority
CN
China
Prior art keywords
pixel
image
pixels
disparity
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110821716.8A
Other languages
Chinese (zh)
Other versions
CN113421210B (en
Inventor
李岩
李国文
吴孟男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Technology
Original Assignee
Dongguan Zhongke Sanwei Fish Intelligent Technology Co ltd
Changchun University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Zhongke Sanwei Fish Intelligent Technology Co ltd, Changchun University of Technology filed Critical Dongguan Zhongke Sanwei Fish Intelligent Technology Co ltd
Priority to CN202110821716.8A priority Critical patent/CN113421210B/en
Publication of CN113421210A publication Critical patent/CN113421210A/en
Application granted granted Critical
Publication of CN113421210B publication Critical patent/CN113421210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration by the use of histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention belongs to the field of digital image processing, and particularly relates to a surface point cloud reconstruction method based on binocular stereo vision. The method comprises the following steps: the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name; secondly, preprocessing the corrected image; step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm; recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map; and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four. The method solves the problems of low reconstruction precision, low speed, poor migratability and the like through the processes of stereo correction, image preprocessing, background removal of the region of interest, stereo matching, point cloud reconstruction and the like.

Description

Surface point cloud reconstruction method based on binocular stereo vision
Technical Field
The invention belongs to the field of digital image processing, and particularly relates to a surface point cloud reconstruction method based on binocular stereo vision.
Background
In recent years, with the increasing level of automation in manufacturing industry and the increasing level of technology transformation in enterprises, machine vision technology is increasingly applied in industrial production, and binocular stereo vision technology is used as a passive, non-contact measuring means, and is favored by the market with a wide range of use conditions, a faster measuring speed and a reasonable price, and is not enjoyed.
The surface point cloud reconstruction technology based on binocular stereo vision can be applied to the fields of part identification and positioning, unmanned aerial vehicle autonomous navigation, satellite remote sensing surveying and mapping, 3D model reconstruction and the like, is a research hotspot and difficulty of artificial intelligence direction at the present stage, and has quite wide application prospect.
After the results of the existing research are summarized, although the existing surface point cloud reconstruction method based on binocular stereo vision is gradually improved, the method is not satisfactory when the following key problems are solved:
1) the existing preprocessing method can not give consideration to the denoising effect and the image characteristic detail retention during image filtering and enhancing, and is easy to cause image blurring and edge deletion, and point cloud defect;
2) the existing surface point cloud reconstruction method carries out point cloud recovery on a global image, lacks directivity, easily causes resource waste, reduces calculation efficiency and causes mismatching;
3) existing neural network-based stereo matching methods mostly calculate matching costs through a single scale without a disparity refinement step or use a traditional old disparity optimization method. The disparity map is likely to be discontinuous.
Disclosure of Invention
The invention provides a surface point cloud reconstruction method based on binocular stereo vision, which solves the problems of low reconstruction precision, low speed, poor migration and the like through the processes of stereo correction, image preprocessing, background removal of an interested region, stereo matching, point cloud reconstruction and the like.
The technical scheme of the invention is described as follows by combining the attached drawings:
a surface point cloud reconstruction method based on binocular stereo vision comprises the following steps:
the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;
secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening;
step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map;
and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.
The specific method of the second step is as follows:
21) weighted median filtering with bilateral filtering as weight;
performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
Figure BDA0003172182850000021
wherein the content of the first and second substances,
Figure BDA0003172182850000022
to adjust the space;
Figure BDA0003172182850000023
is the color similarity; k is a radical ofiIs a regularization factor; i-j non-woven2And | ii-jj|2Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. iiIs the abscissa of the adjacent pixel; j is a function ofjIs the ordinate of the adjacent pixel;
when selecting window RiWhen the size is (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculatediOne pair of random sequences { I (i), wi,jThe pixel values and weights of the z, then the weights are sorted in turn until the cumulative weight is greater than half the weight, at which point the corresponding i*Is the new pixel value at the center point of the local window, as shown in the following equation:
Figure BDA0003172182850000031
wherein i*Is a filtered disparity value; l is the pixel value of the center point of the window; w is aijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;
22) adaptive histogram equalization to define contrast;
carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and denoised image, namely M pixels multiplied by N pixels into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of the gray levels of the histograms which possibly appear as K, and the gray level of each subregion as r, wherein the histogram function corresponding to the region (M, N) is as follows:
Hm,n(r),0≤r≤K-1;
wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;
confirmation of shear clipping magnitude β:
Figure BDA0003172182850000032
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;
23) sharpening the Laplace image;
performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:
Figure BDA0003172182850000033
wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point.
The concrete method of the third step is as follows:
31) the interested area is selected by user interaction, and the pixel in the frame is defined as TuThe other pixels are defined as background pixels TB
32) For TBThe background pixel n in (1) is initialized, and the label of n is alphan0; for TuInitializing a pixel n in the target pixel, wherein the label of the n is alphaN=1;
33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;
34) assigning a Gaussian in a Gaussian mixture model to each pixelSubstituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the highest probability as kn
Figure BDA0003172182850000041
Wherein D isnEnergy data corresponding to the pixel n; alpha is alphanLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; znThe gray value corresponding to the pixel n;
35) and further performing learning optimization on the Gaussian mixture model according to given image data z:
Figure BDA0003172182850000042
wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
36) gibbs energy term D analyzed by step 34)nCalculating Gibbs energy weight 1/knThe segmentation is then estimated by the min-segmentation-max flow algorithm:
Figure BDA0003172182850000043
wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;
38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.
The concrete method of the fourth step is as follows:
41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; the features of the first two layers are up-sampled to the original resolution and fused by a 1 × 1 convolutional layer, with a step length of 1, for calculating the reconstruction error; compressing features of the first layer using a 1 × 1 convolutional layer with a step of 1, which will be used to compute dependencies in the disparity optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be simultaneously applied to a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);
42) the input of the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network DES-net is used for directly regressing the initial disparity;
43) the parallax optimization network, namely the DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which can reflect the correctness of the estimated parallax, and the reconstruction error is calculated as:
Figure BDA0003172182850000051
wherein, ILIs a left image; i isRIs a right image;
Figure BDA0003172182850000052
is the estimated disparity at location (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
The invention has the beneficial effects that:
1) the method has strong robustness to illumination change, and the obtained point cloud model is complete: a three-step image preprocessing method is disclosed, which adopts weighted median filtering with bilateral filtering as weight, adaptive histogram equalization for limiting contrast, and Laplacian image sharpening, and retains edge and characteristic information while ensuring denoising effect
2) The method has high reconstruction speed and high precision, and only takes the reconstructed object as the region of interest to remove the complex background of the region of interest, thereby saving the calculation resources and reducing the mismatching probability caused by similar pixels in the background region;
3) the invention has accurate matching effect and smooth parallax image. The defect that only the matching cost under a single scale is calculated and a parallax optimization link is not provided in a conventional neural network method is overcome by the improved Convolutional Neural Networks (CNNs) which are composed of a shared feature extraction network, a parallax estimation network (DES-net) parallax optimization network (DRS-net) and a parallax optimization network (DRS-net).
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of step two of the present invention;
FIG. 3 is a block diagram of a convolutional neural network of the present invention;
fig. 4 is a flow chart of multi-scale feature extraction.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a surface point cloud reconstruction method based on binocular stereo vision includes the following steps:
the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;
secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening; the method comprises the following specific steps:
with reference to figure 2 of the drawings,
21) weighted median filtering with bilateral filtering as weight;
performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
Figure BDA0003172182850000071
wherein the content of the first and second substances,
Figure BDA0003172182850000072
to adjust the space;
Figure BDA0003172182850000073
is the color similarity; k is a radical ofiIs a regularization factor; i-j non-woven2And | ii-jj|2Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. iiIs the abscissa of the adjacent pixel; j is a function ofjIs the ordinate of the adjacent pixel;
when selecting window RiWhen the size is (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculatediOne pair of random sequences { I (i), wi,jPixel value of } andweights are then sequentially ordered until the cumulative weight is greater than half the weight, at which point the corresponding i*Is the new pixel value at the center point of the local window, as shown in the following equation:
Figure BDA0003172182850000074
wherein i*Is a filtered disparity value; l is the pixel value of the center point of the window; w is aijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;
22) self-adaptive histogram equalization;
carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and dehumidified image, namely M pixels multiplied by N pixels, into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of possible histogram gray levels as K, and the gray level of each subregion as r, and then the histogram function corresponding to the region (M, N) is as follows:
Hm,n(r),0≤r≤K-1;
wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;
confirmation of shear clipping magnitude β:
Figure BDA0003172182850000081
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;
setting the clipping limiting value beta can clip the pixels beyond the limited part, thereby achieving the purpose of limiting the contrast.
23) Sharpening the Laplace image;
performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:
Figure BDA0003172182850000082
wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point;
step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;
31) the interested area is selected by user interaction, and the pixel in the frame is defined as TuThe other pixels are defined as background pixels TB
The region of interest is defined at the discretion of the user.
32) For TBThe background pixel n in (1) is initialized, and the label of n is alphan0; for TuInitializing a pixel n in the target pixel, wherein the label of the n is alphaN=1;
33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;
34) assigning a Gaussian component in the Gaussian mixture model to each pixel to obtain a targetSubstituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the maximum probability as kn
Figure BDA0003172182850000091
Wherein D isnEnergy data corresponding to the pixel n; alpha is alphanLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; znThe gray value corresponding to the pixel n;
35) and further performing learning optimization on the Gaussian mixture model according to given image data z:
Figure BDA0003172182850000092
wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
36) gibbs energy term D analyzed by step 34)nCalculating Gibbs energy weight 1/knThe segmentation is then estimated by the min-segmentation-max flow algorithm:
Figure BDA0003172182850000093
wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;
38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.
Fourthly, with reference to fig. 3, recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map; the system comprises a shared feature extraction module, a disparity estimation network (DES-net) and a disparity optimization network (DRS-net). The shared feature extraction network uses a connected network of shallow codec structures to extract common multi-scale features from left and right images. Some of these features are used to compute the matching cost values (i.e., correlations) of the disparity estimation network (DES-net) and the disparity optimization network (DRS-net). The features of the first layer are further compressed to produce c _ conv1a and c _ conv1b using a 1 × 1 convolution. These shared features are also used to calculate the reconstruction error of the disparity optimization network (DRS-net);
41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; referring to fig. 4, the features of the first two layers are upsampled to the original resolution and fused by 1 × 1 convolutional layer with step size of 1, and features with relatively large receptive field and different abstraction levels are obtained by the last deconvolution layer and the first convolutional layer for calculating the reconstruction error. "Conv 2 a" represents the second convolutional layer that shares the feature extraction module. The features of the first layer are compressed using a 1 x 1 convolutional layer with a step size of 1, which will be used to compute the correlation in the disparity optimized network (DRS-net). The features generated by the shared feature extraction module can be applied in a disparity estimation network (DES-net) and a disparity optimization network (DRS-net) at the same time;
42) the input of the disparity estimation network (DES-net) comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; a disparity estimation network (DES-net) for directly regressing the initial disparity;
43) the disparity optimization network (DRS-net) uses the shared features and the initial disparity to calculate a reconstruction error re, which may reflect the correctness of the estimated disparity, which is calculated as:
Figure BDA0003172182850000101
wherein, ILIs a left image; i isRIs a right image;
Figure BDA0003172182850000102
is the estimated disparity at location (i, j); i is the abscissa of the selected position pixel, and j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
And step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.
Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the scope of the present invention is not limited to the specific details of the above embodiments, and any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and these simple modifications belong to the scope of the present invention.
It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.
In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (4)

1. A surface point cloud reconstruction method based on binocular stereo vision is characterized by comprising the following steps:
the method comprises the following steps that firstly, images shot by a binocular camera are subjected to three-dimensional correction, and left and right images are located at the same pole at the same roll name;
secondly, preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering with bilateral filtering as weight, adaptive histogram equalization and Laplace image sharpening;
step three, performing complex background removal on the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a disparity map;
and step five, reconstructing the surface point cloud according to the disparity map obtained in the step four.
2. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the second step is as follows:
21) weighted median filtering with bilateral filtering as weight;
performing weighted median filtering with bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
Figure FDA0003172182840000011
wherein the content of the first and second substances,
Figure FDA0003172182840000012
to adjust the space;
Figure FDA0003172182840000013
is the color similarity; k is a radical ofiIs a regularization factor; i-j non-woven2And | ii-jj|2Is the spatial similarity between the central pixel and the neighboring pixels; i is the abscissa of the central pixel; j is the ordinate of the central pixel; i.e. iiIs the abscissa of the adjacent pixel; j is a function ofjIs the ordinate of the adjacent pixel;
when selecting window RiSize and breadthIs (2R +1) × (2R +1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculatediOne pair of random sequences { I (i), wi,jThe pixel values and weights of the z, then the weights are sorted in turn until the cumulative weight is greater than half the weight, at which point the corresponding i*Is the new pixel value of the local window center point; as shown in the following formula:
Figure FDA0003172182840000014
wherein i*Is a filtered disparity value; l is the pixel value of the center point of the window; w is aijIs the filtering weight; n is the total number of pixels in the window; i is the current accumulated pixel number;
22) adaptive histogram equalization to define contrast;
carrying out adaptive histogram equalization of limited contrast on the filtered image; dividing the filtered and denoised image, namely M pixels multiplied by N pixels into a plurality of subregions with the same size, respectively calculating the histogram of each subregion, recording the number of the gray levels of the histograms which possibly appear as K, and the gray level of each subregion as r, wherein the histogram function corresponding to the region (M, N) is as follows:
Hm,n(r),0≤r≤K-1;
wherein r is the gray level of each sub-region; k is the number of the grey levels of the histogram;
confirmation of shear clipping magnitude β:
Figure FDA0003172182840000021
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of the grey levels of the histogram; α is a truncation coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subregions, processing each pixel by using a bilinear interpolation method, and calculating a processed gray value;
23) sharpening the Laplace image;
performing Laplace enhancement on the image after the histogram equalization, multiplying and summing selected pixel points in the image and 8 points in the neighborhood of the pixel points by a mask, and replacing the pixel value of the central point in the original Sudoku by the obtained new pixel value, so that for the point (i, j), the image processed by a Laplace operator is obtained:
Figure FDA0003172182840000022
wherein k (m, n) is a laplacian mask of 3 × 3; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplace operator; m is the horizontal coordinate of the central pixel of the squared figure; n is the longitudinal coordinate of the central pixel of the squared figure; i is the abscissa of the selected point; j is the ordinate of the selected point.
3. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the third step is as follows:
31) the interested area is selected by user interaction, and the pixel in the frame is defined as TuThe other pixels are defined as background pixels TB
32) For TBThe background pixel n in (1) is initialized, and the label of n is alphan0; for TuInitializing a pixel n in the target pixel, wherein the label of the n is alphaN=1;
33) Through steps 31) and 32), preliminarily classifying target pixels and background pixels, then establishing Gaussian mixture models for the target pixels and the background pixels, clustering the target pixels into K classes through a K-means algorithm, ensuring that each Gaussian model in the Gaussian mixture models has a certain pixel sample, estimating parameter mean and covariance through RGB values of the pixels, and further determining the weight through the ratio of the number of the pixels of the Gaussian components to the total number of the pixels; the initialization process is ended;
34) distributing Gaussian components in the Gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each Gaussian component in the Gaussian mixture model, and calculating the component with the maximum probability to be recorded as kn
Figure FDA0003172182840000031
Wherein D isnEnergy data corresponding to the pixel n; alpha is alphanLabel value for opacity corresponding to pixel n; theta is a gray level histogram of a target or background region of the image; znThe gray value corresponding to the pixel n;
35) and further performing learning optimization on the Gaussian mixture model according to given image data z:
Figure FDA0003172182840000032
wherein, U is the sum of energy data items corresponding to each pixel;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
36) gibbs energy term D analyzed by step 34)nCalculating Gibbs energy weight 1/knThe segmentation is then estimated by the min-segmentation-max flow algorithm:
Figure FDA0003172182840000033
wherein E: (A)α,k,θZ) is the gibbs energy of the graph partitioning algorithm;αlabel values for opacity; k is a Gaussian mixture model parameter; z is a gray value array;θa gray level histogram of a target or background region of the image;
37) repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iteration process can be converged to the minimum value, thereby obtaining a segmentation result;
38) and performing smooth post-processing on the segmentation result by adopting a boundary extinction mechanism.
4. The binocular stereo vision-based surface point cloud reconstruction method according to claim 1, wherein the specific method of the fourth step is as follows:
41) performing feature detection on the left camera image and the right camera image through a first layer and a last layer of a shared feature extraction module to obtain a multi-scale matching cost value; the features of the first two layers are up-sampled to the original resolution and fused by a 1 × 1 convolutional layer, with a step length of 1, for calculating the reconstruction error; compressing features of the first layer using a 1 × 1 convolutional layer with a step of 1, which will be used to compute the correlation in the disparity-optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be simultaneously applied to a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);
42) the input of the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, which stores the cost of all possible differences in image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network DES-net is used for directly regressing the initial disparity;
43) the parallax optimization network, namely the DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which can reflect the correctness of the estimated parallax, and the reconstruction error is calculated as:
Figure FDA0003172182840000041
wherein, ILIs a left image; i isRIs a right image;
Figure FDA0003172182840000042
is the estimated disparity at location (i, j)(ii) a i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the concatenation of the reconstruction error, the initial disparity and the left feature is fed to a third codec structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
CN202110821716.8A 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision Active CN113421210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110821716.8A CN113421210B (en) 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110821716.8A CN113421210B (en) 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision

Publications (2)

Publication Number Publication Date
CN113421210A true CN113421210A (en) 2021-09-21
CN113421210B CN113421210B (en) 2024-04-12

Family

ID=77721554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110821716.8A Active CN113421210B (en) 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision

Country Status (1)

Country Link
CN (1) CN113421210B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695393A (en) * 2022-12-28 2023-02-03 山东矩阵软件工程股份有限公司 Format conversion method, system and storage medium for radar point cloud data
CN116630761A (en) * 2023-06-16 2023-08-22 中国人民解放军61540部队 Digital surface model fusion method and system for multi-view satellite images

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080052363A (en) * 2006-12-05 2008-06-11 한국전자통신연구원 Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation
CN104867135A (en) * 2015-05-04 2015-08-26 中国科学院上海微系统与信息技术研究所 High-precision stereo matching method based on guiding image guidance
CN104978722A (en) * 2015-07-06 2015-10-14 天津大学 Multi-exposure image fusion ghosting removing method based on background modeling
CN112288689A (en) * 2020-10-09 2021-01-29 浙江未来技术研究院(嘉兴) Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080052363A (en) * 2006-12-05 2008-06-11 한국전자통신연구원 Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation
CN104867135A (en) * 2015-05-04 2015-08-26 中国科学院上海微系统与信息技术研究所 High-precision stereo matching method based on guiding image guidance
CN104978722A (en) * 2015-07-06 2015-10-14 天津大学 Multi-exposure image fusion ghosting removing method based on background modeling
CN112288689A (en) * 2020-10-09 2021-01-29 浙江未来技术研究院(嘉兴) Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
祁乐阳: "《基于双目立体视觉的人脸三维重建关键技术研究》", 《优秀硕士论文》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695393A (en) * 2022-12-28 2023-02-03 山东矩阵软件工程股份有限公司 Format conversion method, system and storage medium for radar point cloud data
CN116630761A (en) * 2023-06-16 2023-08-22 中国人民解放军61540部队 Digital surface model fusion method and system for multi-view satellite images

Also Published As

Publication number Publication date
CN113421210B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN108921799B (en) Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network
Engin et al. Cycle-dehaze: Enhanced cyclegan for single image dehazing
Fu et al. Removing rain from single images via a deep detail network
CN108765325B (en) Small unmanned aerial vehicle blurred image restoration method
CN108230264B (en) Single image defogging method based on ResNet neural network
CN111915530B (en) End-to-end-based haze concentration self-adaptive neural network image defogging method
CN112232349A (en) Model training method, image segmentation method and device
CN109584282B (en) Non-rigid image registration method based on SIFT (scale invariant feature transform) features and optical flow model
CN111899295B (en) Monocular scene depth prediction method based on deep learning
CN110136075B (en) Remote sensing image defogging method for generating countermeasure network based on edge sharpening cycle
CN110570440A (en) Image automatic segmentation method and device based on deep learning edge detection
CN107506792B (en) Semi-supervised salient object detection method
CN113421210B (en) Surface point Yun Chong construction method based on binocular stereoscopic vision
CN111681198A (en) Morphological attribute filtering multimode fusion imaging method, system and medium
CN114283162A (en) Real scene image segmentation method based on contrast self-supervision learning
CN116310095A (en) Multi-view three-dimensional reconstruction method based on deep learning
Yu et al. Split-attention multiframe alignment network for image restoration
CN116310098A (en) Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
Zhang et al. A new image filtering method: Nonlocal image guided averaging
CN110264417B (en) Local motion fuzzy area automatic detection and extraction method based on hierarchical model
CN110490877B (en) Target segmentation method for binocular stereo image based on Graph Cuts
CN113837243A (en) RGB-D camera dynamic visual odometer method based on edge information
Zhang et al. Single image haze removal for aqueous vapour regions based on optimal correction of dark channel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231017

Address after: No. 2055, Yan'an street, Changchun City, Jilin Province

Applicant after: Changchun University of Technology

Address before: 523000 room 222, building 1, No. 1, Kehui Road, Dongguan City, Guangdong Province

Applicant before: Dongguan Zhongke Sanwei fish Intelligent Technology Co.,Ltd.

Applicant before: Changchun University of Technology

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant