CN113421210B - Surface point Yun Chong construction method based on binocular stereoscopic vision - Google Patents

Surface point Yun Chong construction method based on binocular stereoscopic vision Download PDF

Info

Publication number
CN113421210B
CN113421210B CN202110821716.8A CN202110821716A CN113421210B CN 113421210 B CN113421210 B CN 113421210B CN 202110821716 A CN202110821716 A CN 202110821716A CN 113421210 B CN113421210 B CN 113421210B
Authority
CN
China
Prior art keywords
image
pixel
pixels
value
disparity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110821716.8A
Other languages
Chinese (zh)
Other versions
CN113421210A (en
Inventor
李岩
李国文
吴孟男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Technology
Original Assignee
Changchun University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Technology filed Critical Changchun University of Technology
Priority to CN202110821716.8A priority Critical patent/CN113421210B/en
Publication of CN113421210A publication Critical patent/CN113421210A/en
Application granted granted Critical
Publication of CN113421210B publication Critical patent/CN113421210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration by the use of histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention belongs to the field of digital image processing, and particularly relates to a surface point Yun Chong construction method based on binocular stereoscopic vision. Comprising the following steps: step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling; step two, preprocessing the corrected image; step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm; recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image; and fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four. The method solves the problems of low reconstruction precision, low speed, poor mobility and the like through the procedures of three-dimensional correction, image preprocessing, interested region background removal, three-dimensional matching, point cloud reconstruction and the like.

Description

Surface point Yun Chong construction method based on binocular stereoscopic vision
Technical Field
The invention belongs to the field of digital image processing, and particularly relates to a surface point Yun Chong construction method based on binocular stereoscopic vision.
Background
In recent years, with the gradual improvement of the automation level of the manufacturing industry and the continuous upgrade of the scientific transformation of enterprises, the machine vision technology is increasingly applied to the industrial production, and the binocular stereo vision technology is taken as a passive and non-contact measurement means, so that the rapid measurement speed and reasonable price are favored by the market under the wide use condition.
The surface point cloud reconstruction technology based on binocular stereoscopic vision can be applied to the fields of part identification and positioning, unmanned aerial vehicle autonomous navigation, satellite remote sensing mapping, 3D model reconstruction and the like, is a research hotspot and difficulty in the artificial intelligence direction at the present stage, and has quite wide application prospects.
Through summarizing the prior research results, the existing surface point Yun Chong construction method based on binocular stereoscopic vision is gradually perfected, but the following key problems are solved:
1) The existing preprocessing method cannot give consideration to denoising effect and image characteristic detail retention during image filtering and enhancing, and is easy to cause image blurring and edge deletion, so that point cloud defects are caused;
2) The existing surface point Yun Chong building method aims at the global image to perform point cloud recovery, lacks directivity, easily causes resource waste, reduces calculation efficiency and causes mismatching;
3) Existing neural network-based stereo matching methods mostly calculate matching costs through a single scale without a disparity refinement step, or use a traditional old disparity optimization method. Which tends to cause disparity map discontinuities.
Disclosure of Invention
The invention provides a surface point Yun Chong construction method based on binocular stereo vision, which solves the problems of low reconstruction precision, low speed, poor mobility and the like through the processes of stereo correction, image preprocessing, interested area background removal, stereo matching, point cloud reconstruction and the like.
The technical scheme of the invention is as follows in combination with the accompanying drawings:
a surface point cloud reconstruction method based on binocular stereoscopic vision comprises the following steps:
step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;
preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening;
step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image;
and fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four.
The specific method of the second step is as follows:
21 Weighted median filtering with bilateral filtering as weight;
performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein,to adjust the space; />Is color similarity; k (k) i Is a regularization factor; i-j 2 And |i i -j j | 2 Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e i Is the abscissa of the adjacent pixel; j (j) j Is the ordinate of the adjacent pixel;
selecting window R i When the size is (2r+1) × (2r+1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated i A pair of random sequences { I (I), w i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i * Is a new pixel value for the center point of the local window, as shown in the following equation:
wherein i is * Is the filtered parallax value; l is the pixel value of the window center point; w (w) ij Is a filtering weight; n is the total number of pixels in the windowThe method comprises the steps of carrying out a first treatment on the surface of the I is the number of pixels accumulated at present;
22 Adaptive histogram equalization defining contrast;
performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and denoised image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of the gray levels of the histograms which possibly occur as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:
H m,n (r),0≤r≤K-1;
where r is the gray level of each sub-region; k is the number of gray levels of the histogram;
confirming the shearing limit value beta:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;
23 Laplace image sharpening;
carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (i, j), the image subjected to Laplace operator treatment is:
wherein k (m, n) is a 3×3 laplacian mask; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplacian; m is the horizontal coordinate of the center pixel of the nine-grid; n is the ordinate of the center pixel of the nine-grid; i is the abscissa of the selected point; j is the ordinate of the selected point.
The specific method of the third step is as follows:
31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T u The other pixels are defined as background pixels T B
32 For T) B Initializing background pixel n in the image, and marking n as alpha n =0; for T u Initializing a pixel n in the target pixel, and marking the label of n as alpha N =1;
33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary way, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K classes through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of the Gaussian components to the total number of the pixels; the initialization process ends so far;
34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k n
Wherein D is n The energy data corresponding to the pixel n; alpha n An opacity index value corresponding to pixel n; θ is a gray histogram of a target or background region of the image; z is Z n A gray value corresponding to the pixel n;
35 Further learning optimization of the gaussian mixture model from the given image data z:
wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
36 Gibbs energy term D) analyzed by step 34) n Obtaining Gibbs energy weight 1/k n The segmentation is then estimated by a min-max flow algorithm:
wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;
38 Smoothing the segmentation result by adopting a boundary extinction mechanism.
The specific method of the fourth step is as follows:
41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; the features of the first two layers are up-sampled to the original resolution and fused by a 1 x 1 convolution layer, the step length is 1, and the features are used for calculating reconstruction errors; compressing the features of the first layer using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate the correlation in the parallax optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be applied to both a disparity estimation network, namely DES-net, and a disparity optimization network, namely DRS-net;
42 The input to the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network, DES-net, is used to directly regress the initial disparity;
43 The parallax optimization network, i.e. DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:
wherein I is L Is a left image; i R Is the right image;is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
The beneficial effects of the invention are as follows:
1) The method has strong robustness to illumination change, and the obtained point cloud model is complete: a three-step image preprocessing method is disclosed, which uses weighted median filtering with bilateral filtering as weight, adaptive histogram equalization with limited contrast, laplacian image sharpening, and maintains the edge and characteristic information while ensuring the denoising effect
2) The invention has high reconstruction speed and high precision, only considers the reconstruction object as the region of interest to remove the complex background, saves the calculation resources and reduces the mismatching probability caused by similar pixels of the background region;
3) The invention has accurate matching effect and smooth parallax map. The improved Convolutional Neural Network (CNNs) consists of a shared feature extraction network, a parallax estimation network (DES-net) parallax optimization network (DRS-net) and a parallax optimization network (DRS-net), and solves the defects that only matching cost under a single scale is calculated and a parallax optimization link is not available in a conventional neural network method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of a second step of the present invention;
FIG. 3 is a block diagram of a convolutional neural network of the present invention;
FIG. 4 is a flow chart of multi-scale feature extraction.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a surface point cloud reconstruction method based on binocular stereoscopic vision includes the following steps:
step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;
preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening; the method comprises the following steps:
with reference to figure 2 of the drawings,
21 Weighted median filtering with bilateral filtering as weight;
performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein,to adjust the space; />Is color similarity; k (k) i Is a regularization factor; i-j 2 And |i i -j j | 2 Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e i Is the abscissa of the adjacent pixel; j (j) j Is the ordinate of the adjacent pixel;
selecting window R i When the size is (2r+1) × (2r+1), wherein R is the window radius, the number of pixels contained in the window is n, and the window R is calculated i A pair of random sequences { I (I), w i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i * Is a new pixel value for the center point of the local window, as shown in the following equation:
wherein i is * Is the filtered parallax value; l is the pixel value of the window center point; w (w) ij Is a filtering weight; n is the total number of pixels within the window; i is the number of pixels accumulated at present;
22 Adaptive histogram equalization;
performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and de-dried image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of possible gray levels of the histograms as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:
H m,n (r),0≤r≤K-1;
where r is the gray level of each sub-region; k is the number of gray levels of the histogram;
confirming the shearing limit value beta:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;
the clipping limiting value beta is set to clip the pixels beyond the limiting part, so that the purpose of limiting the contrast is achieved.
23 Laplace image sharpening;
carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (i, j), the image subjected to Laplace operator treatment is:
wherein k (m, n) is a 3×3 laplacian mask; p (i, j) is the gray value of the original image, and L (i, j) is the image processed by the Laplacian; m is the horizontal coordinate of the center pixel of the nine-grid; n is the ordinate of the center pixel of the nine-grid; i is the abscissa of the selected point; j is the ordinate of the selected midpoint;
step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;
31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T u The other pixels are defined as background pixels T B
The region of interest is defined by the user himself.
32 For T) B Initializing background pixel n in the image, and marking n as alpha n =0; for T u Initializing a pixel n in the target pixel, and marking the label of n as alpha N =1;
33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary way, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K classes through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of the Gaussian components to the total number of the pixels; the initialization process ends so far;
34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel n into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k n
Wherein D is n The energy data corresponding to the pixel n; alpha n An opacity index value corresponding to pixel n; θ is a gray histogram of a target or background region of the image; z is Z n A gray value corresponding to the pixel n;
35 Further learning optimization of the gaussian mixture model from the given image data z:
wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
36 Gibbs energy term D) analyzed by step 34) n Obtaining Gibbs energy weight 1/k n The segmentation is then estimated by a min-max flow algorithm:
wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;
38 Smoothing the segmentation result by adopting a boundary extinction mechanism.
Step four, referring to fig. 3, recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image; it includes a shared feature extraction module, a disparity estimation network (DES-net), a disparity optimization network (DRS-net). The shared feature extraction network uses a connected network of shallow codec structures to extract common multi-scale features from left and right images. Part of these features are used to calculate matching cost values (i.e., correlations) for the disparity estimation network (DES-net) and the disparity optimization network (DRS-net). The features of the first layer are further compressed to produce c_conv1a and c_conv1b using a 1 x 1 convolution. These shared features are also used to calculate the reconstruction errors of the parallax optimized network (DRS-net);
41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; referring to fig. 4, features of the first two layers are up-sampled to the original resolution and fused by 1 x 1 convolution layers, with a step size of 1, and features with a relatively large receptive field and different levels of abstraction are obtained by the last deconvolution layer and the first convolution layer for calculating the reconstruction error, wherein. "Conv2a" represents the second convolution layer sharing the feature extraction module. The features of the first layer are compressed using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate correlations in a parallax optimized network (DRS-net). The features generated by the shared feature extraction module may be applied in both a disparity estimation network (DES-net) and a disparity optimization network (DRS-net);
42 The input of the disparity estimation network (DES-net) comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; a disparity estimation network (DES-net) is used to directly regress the initial disparity;
43 A parallax optimization network (DRS-net) uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:
wherein I is L Is a left image; i R Is the right image;is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel and j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
And fifthly, reconstructing the surface point cloud according to the parallax map obtained in the step four.
The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the scope of the present invention is not limited to the specific details of the above embodiments, and within the scope of the technical concept of the present invention, any person skilled in the art may apply equivalent substitutions or alterations to the technical solution according to the present invention and the inventive concept thereof within the scope of the technical concept of the present invention, and these simple modifications are all within the scope of the present invention.
In addition, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further.
Moreover, any combination of the various embodiments of the invention can be made without departing from the spirit of the invention, which should also be considered as disclosed herein.

Claims (3)

1. The surface point cloud reconstruction method based on binocular stereoscopic vision is characterized by comprising the following steps of:
step one, carrying out three-dimensional correction on images shot by a binocular camera to enable left and right images to be positioned on the same pole with roll calling;
preprocessing the corrected image, wherein the preprocessing comprises weighted median filtering taking bilateral filtering as weight, self-adaptive histogram equalization and Laplacian image sharpening;
step three, removing complex background of the region of interest through a minimum cut-maximum flow image segmentation algorithm;
recovering depth information through a convolutional neural network stereo matching algorithm to obtain a parallax image;
step five, reconstructing the surface point cloud according to the parallax map obtained in the step four;
the specific method of the second step is as follows:
21 Weighted median filtering with bilateral filtering as weight;
performing weighted median filtering taking bilateral filtering as weight on the corrected image; the bilateral filter weights are expressed as:
wherein,to adjust the space; />Is color similarity; k (k) i Is a regularization factor; i-j 2 And |i i -j j | 2 Spatial similarity between a center pixel and neighboring pixels; i is the abscissa of the center pixel; j is the ordinate of the center pixel; i.e i Is the abscissa of the adjacent pixel; j (j) j Is the ordinate of the adjacent pixel;
selecting window R i When the size is (2r+1) × (2r+1), wherein R is the radius of the window, the number of pixels contained in the window is s, and the window R is calculated i A pair of random sequences { I (I), w i,j The pixel values and weights of the pixel values are then ordered sequentially until the cumulative weight is greater than half the weight value, at which point the corresponding i * New pixel values that are the center points of the local windows; the following formula is shown:
wherein i is * Is the filtered parallax value; l is the pixel value of the window center point; w (w) ij Is a filtering weight; s is the total number of pixels in the window; i is the number of pixels accumulated at present;
22 Adaptive histogram equalization defining contrast;
performing adaptive histogram equalization of defined contrast on the filtered image; dividing the filtered and denoised image, namely M pixels and N pixels, into a plurality of subregions with the same size, respectively calculating the histograms of each subregion, and recording the number of the gray levels of the histograms which possibly occur as K and the gray level of each subregion as r, wherein the histogram functions corresponding to the regions (M, N) are as follows:
H m,n (r),0≤r≤K-1;
where r is the gray level of each sub-region; k is the number of gray levels of the histogram;
confirming the shearing limit value beta:
wherein M is the number of pixels in the horizontal direction of the image; n is the number of pixels in the vertical direction of the image; k is the number of gray levels of the histogram; alpha is a truncated coefficient representing the maximum percentage of pixels in each gray level;
performing histogram equalization on all the divided subareas, processing each pixel by using a bilinear interpolation method, and calculating the gray value after processing;
23 Laplace image sharpening;
carrying out Laplace enhancement on the image subjected to histogram equalization, multiplying and summing 8 points in the selected pixel point and the neighborhood thereof in the image with a mask, and replacing the pixel value of the central point in the original nine-grid with the obtained new pixel value, wherein for the point (u, v), the image is subjected to Laplace operator treatment:
wherein k (mz, nz) is a 3×3 laplacian mask; p (u, v) is the gray value of the original image, and L (u, v) is the image processed by the Laplacian operator; mz is the grid center pixel abscissa; nz is the ordinate of the center pixel of the nine-grid; u is the abscissa of the selected midpoint; v is the ordinate of the selected point.
2. The surface point cloud reconstruction method based on binocular stereoscopic vision according to claim 1, wherein the specific method of the third step is as follows:
31 Frame-selecting the region of interest by user interaction, defining the pixels within the frame as T u The other pixels are defined as background pixels T B
32 For T) B Initializing background pixel n in the image, and marking n as alpha n =0; for T u Initializing a pixel nt in a target pixel, wherein the label of the nt is alpha N =1;
33 Step 31) and step 32) are carried out on the target pixels and the background pixels in a preliminary mode, then a Gaussian mixture model is built on the target pixels and the background pixels, the target pixels are clustered into K types through a K-means algorithm, each Gaussian model in the Gaussian mixture model is guaranteed to have a certain pixel sample, the mean value and the covariance of parameters are estimated through the RGB values of the pixels, and the weight of the Gaussian mixture model is further determined through the ratio of the pixels of Gaussian components to the total number of the pixels; the initialization process ends so far;
34 Assigning gaussian components in the gaussian mixture model to each pixel, substituting the RGB value of the target pixel nt into each gaussian component in the gaussian mixture model, and determining the component with the highest probability as k nt
Wherein D is nt The energy data corresponding to the pixel nt; alpha nt An opacity index value corresponding to pixel nt; θ is a gray histogram of a target or background region of the image; z is Z nt A gray value corresponding to the pixel nt;
35 Further learning optimization of the gaussian mixture model from the given image data z:
wherein U is the sum of energy data items corresponding to each pixel;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
36 Gibbs energy term D) analyzed by step 34) n Obtaining Gibbs energy weight 1/k n The segmentation is then estimated by a min-max flow algorithm:
wherein E is%α,k,θZ) is the gibbs energy of the graph segmentation algorithm;αis an opacity index value; k is a Gaussian mixture model parameter; z is a gray value array;θgray level histogram of the target or background area of the image;
37 Repeating the steps 34) -36), continuously optimizing the Gaussian mixture model, and ensuring that the iterative process can be converged to a minimum value so as to obtain a segmentation result;
38 Smoothing the segmentation result by adopting a boundary extinction mechanism.
3. The surface point cloud reconstruction method based on binocular stereoscopic vision according to claim 1, wherein the specific method of the fourth step is as follows:
41 The left and right images are subjected to feature detection through a first layer and a last layer of the shared feature extraction module, so that multi-scale matching cost values are obtained; the features of the first two layers are up-sampled to the original resolution and fused by a 1 x 1 convolution layer, the step length is 1, and the features are used for calculating reconstruction errors; compressing the features of the first layer using a 1 x 1 convolution layer with a stride of 1, which will be used to calculate the correlation in the parallax optimized network, i.e. DRS-net; the features generated by the shared feature extraction module can be applied to both a disparity estimation network, namely DES-net, and a disparity optimization network, namely DRS-net;
42 The input to the disparity estimation network, DES-net, comprises two parts; the first part is the dot product of the left and right features from the last layer of the shared feature extraction module, the output of which is the matching cost value of the left and right images, and which stores the cost of all possible differences in the image coordinates (x, y); the second part is defined as a feature map of the left image, which provides the necessary semantic information for disparity estimation; the disparity estimation network, DES-net, is used to directly regress the initial disparity;
43 The parallax optimization network, i.e. DRS-net, uses the shared features and the initial parallax to calculate a reconstruction error re, which may reflect the correctness of the estimated parallax, the reconstruction error being calculated as:
re(i,j)=|I L (i,j)-I R (i+d ij ,j)|;
wherein I is L Is a left image; i R Is the right image; d, d ij Is the estimated disparity at position (i, j); i is the abscissa of the selected position pixel; j is the ordinate of the selected position pixel; the connection of the reconstruction error, the initial disparity and the left feature is fed to a third encoder decoder structure to calculate a residual with respect to the initial disparity; the sum of the initial disparity and the residual is used to generate a refined disparity.
CN202110821716.8A 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision Active CN113421210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110821716.8A CN113421210B (en) 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110821716.8A CN113421210B (en) 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision

Publications (2)

Publication Number Publication Date
CN113421210A CN113421210A (en) 2021-09-21
CN113421210B true CN113421210B (en) 2024-04-12

Family

ID=77721554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110821716.8A Active CN113421210B (en) 2021-07-21 2021-07-21 Surface point Yun Chong construction method based on binocular stereoscopic vision

Country Status (1)

Country Link
CN (1) CN113421210B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695393B (en) * 2022-12-28 2023-03-21 山东矩阵软件工程股份有限公司 Format conversion method, system and storage medium for radar point cloud data
CN116630761A (en) * 2023-06-16 2023-08-22 中国人民解放军61540部队 Digital surface model fusion method and system for multi-view satellite images

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080052363A (en) * 2006-12-05 2008-06-11 한국전자통신연구원 Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation
CN104867135A (en) * 2015-05-04 2015-08-26 中国科学院上海微系统与信息技术研究所 High-precision stereo matching method based on guiding image guidance
CN104978722A (en) * 2015-07-06 2015-10-14 天津大学 Multi-exposure image fusion ghosting removing method based on background modeling
CN112288689A (en) * 2020-10-09 2021-01-29 浙江未来技术研究院(嘉兴) Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080052363A (en) * 2006-12-05 2008-06-11 한국전자통신연구원 Apparatus and method of matching binocular/multi-view stereo using foreground/background separation and image segmentation
CN104867135A (en) * 2015-05-04 2015-08-26 中国科学院上海微系统与信息技术研究所 High-precision stereo matching method based on guiding image guidance
CN104978722A (en) * 2015-07-06 2015-10-14 天津大学 Multi-exposure image fusion ghosting removing method based on background modeling
CN112288689A (en) * 2020-10-09 2021-01-29 浙江未来技术研究院(嘉兴) Three-dimensional reconstruction method and system for operation area in microscopic operation imaging process

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于双目立体视觉的人脸三维重建关键技术研究》;祁乐阳;《优秀硕士论文》;全文 *

Also Published As

Publication number Publication date
CN113421210A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN110738697B (en) Monocular depth estimation method based on deep learning
CN108921799B (en) Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network
CN110866924B (en) Line structured light center line extraction method and storage medium
CN108765325B (en) Small unmanned aerial vehicle blurred image restoration method
CN112819772B (en) High-precision rapid pattern detection and recognition method
CN111160407B (en) Deep learning target detection method and system
CN109685045B (en) Moving target video tracking method and system
CN113421210B (en) Surface point Yun Chong construction method based on binocular stereoscopic vision
CN114529459B (en) Method, system and medium for enhancing image edge
CN110136075B (en) Remote sensing image defogging method for generating countermeasure network based on edge sharpening cycle
CN111310508B (en) Two-dimensional code identification method
Pei et al. Effects of image degradations to cnn-based image classification
CN113052755A (en) High-resolution image intelligent matting method based on deep learning
CN114283162A (en) Real scene image segmentation method based on contrast self-supervision learning
CN111681198A (en) Morphological attribute filtering multimode fusion imaging method, system and medium
CN116310095A (en) Multi-view three-dimensional reconstruction method based on deep learning
CN113160278A (en) Scene flow estimation and training method and device of scene flow estimation model
CN109241981B (en) Feature detection method based on sparse coding
CN110175972B (en) Infrared image enhancement method based on transmission map fusion
CN111612802A (en) Re-optimization training method based on existing image semantic segmentation model and application
CN111627033B (en) Method, equipment and computer readable storage medium for dividing difficult sample instance
CN110264417B (en) Local motion fuzzy area automatic detection and extraction method based on hierarchical model
CN110490877B (en) Target segmentation method for binocular stereo image based on Graph Cuts
Zhang et al. Single image haze removal for aqueous vapour regions based on optimal correction of dark channel
CN109359654B (en) Image segmentation method and system based on frequency tuning global saliency and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231017

Address after: No. 2055, Yan'an street, Changchun City, Jilin Province

Applicant after: Changchun University of Technology

Address before: 523000 room 222, building 1, No. 1, Kehui Road, Dongguan City, Guangdong Province

Applicant before: Dongguan Zhongke Sanwei fish Intelligent Technology Co.,Ltd.

Applicant before: Changchun University of Technology

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant