CN110060290B - Binocular parallax calculation method based on 3D convolutional neural network - Google Patents
Binocular parallax calculation method based on 3D convolutional neural network Download PDFInfo
- Publication number
- CN110060290B CN110060290B CN201910195328.6A CN201910195328A CN110060290B CN 110060290 B CN110060290 B CN 110060290B CN 201910195328 A CN201910195328 A CN 201910195328A CN 110060290 B CN110060290 B CN 110060290B
- Authority
- CN
- China
- Prior art keywords
- parallax
- log
- distribution
- value
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a binocular disparity calculation method based on a 3D convolutional neural network. The method comprises the following steps: s1, respectively extracting the characteristics of input left and right views according to a defined multi-scale characteristic extraction method; s2, stacking the characteristics of the corresponding parallax positions of the left and right images to obtain a cost volume of 4D; s3, performing cost aggregation by using a 3D CNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling to the resolution of the original image to obtain a log-likelihood estimation of the possible parallax value of each pixel, and performing log-normalization operation to obtain a new log-likelihood estimation; s4, calculating the set real distribution; s5, carrying out reverse propagation training; s6, after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution; s7, finding out a parallax value corresponding to the maximum probability, and S8, obtaining normalized probability distribution according to the left and right parallax values and the parallax probability distribution; and S9, obtaining a final estimated value of each pixel parallax through weighted average operation. The invention can effectively improve the parallax calculation precision.
Description
Technical Field
The invention relates to the field of binocular vision system processing, in particular to a binocular parallax computing method based on a 3D convolutional neural network.
Background
The binocular vision system is used as a low-cost method for obtaining depth, and has important application in multiple fields of robots. The method comprises the steps of drawing, obstacle avoidance, positioning and the like. Particularly, the method has important application in the fields of automatic driving, augmented reality and the like, such as 3D target detection, 3D environment perception and the like. The method has the characteristics of low cost, high robustness, strong anti-interference performance and the like.
Conventional disparity estimation methods typically consist of 4 parts: feature extraction, cost calculation, cost aggregation and parallax optimization. With the development of convolutional neural networks and related hardware, CNN estimation of disparity becomes a better application. There are still many problems such as extraction of multi-scale features, high-precision optimization of parallax, and the like.
Disclosure of Invention
The present invention provides a solution to at least one of the above-mentioned drawbacks of the prior art.
In order to solve the technical problems, the invention adopts the technical scheme that: a binocular disparity calculation method based on a 3D convolutional neural network comprises the following steps:
s1, constructing a multi-scale feature extraction network structure, and defining a multi-scale feature extraction method according to the structure;
s2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step S1, and setting the obtained features as F1、F2;
S3, according to the extracted left and right image characteristics F1And F2Stacking the characteristics of the corresponding parallax positions of the left and right images to obtain 4D costvolume;
s4, based on the constructed 4D costvolume, performing cost aggregation by using a 3DCNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain a log-likelihood estimation of a possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L;
s5, calculating the set real distribution according to the real parallax value of the training data;
s6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss;
s7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution Pi;
S8, based on the obtained parallax probability distribution PiFinding the parallax value corresponding to the maximum probability, and setting the parallax value as dmax;
S9. the parallax error between left and right sidesValue and disparity probability distribution to obtain normalized probability distribution
Further, the method for the multi-scale feature extraction network in the step S1 includes: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features.
Further, in the step S4, the true distribution is set as gaussian distribution, and the parallax true value is set as: gt, true distribution is: di, then:
wherein, N is a parallax enumeration value, 0< i < N, and v is a preset value.
Further, in the step S7, the disparity probability distribution PiComprises the following steps:
where N is a disparity enumeration value.
Further, in the step S8, the left and right parallax values d are set simultaneouslylAnd drComprises the following steps:
dl=dmax-v,dr=dmax+v。
compared with the prior art, the beneficial effects are: the binocular disparity calculation method based on the 3D convolutional neural network can efficiently extract multi-scale features, can train more effectively, and can deduce more accurately, and the disparity calculation scheme has higher precision due to the three advantages.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a multi-scale feature extraction network structure according to the present invention.
FIG. 3 is a diagram illustrating a calculation result according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of the true parallax value in the embodiment of the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
As shown in fig. 1, a binocular disparity calculation method based on a 3D convolutional neural network includes the following steps:
step 1, constructing a network structure for multi-scale feature extraction, as shown in fig. 2, defining a multi-scale feature extraction method according to the structure: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features.
Step 2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step 1, and setting the obtained features as F1、F2;
Step 3, according to the extracted left and right image characteristics F1And F2And stacking the characteristics of the corresponding parallax positions of the left and right images to obtain the costvolume of 4D.
And 4, based on the constructed 4Dcostvolume, performing cost aggregation by using a 3DCNN sub-network to obtain the log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain the log-likelihood estimation of the possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L.
Step 5, calculating the set real distribution according to the real parallax value of the training data; here, the true distribution is set as a gaussian distribution, and the true parallax value is set as: gt, true distribution is: di, then:
wherein, N is a parallax enumeration value, 0< i < N, and v is a preset value.
And 6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss.
And 7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution PiDisparity probability distribution PiComprises the following steps:
where N is a disparity enumeration value.
Step 8, based on the obtained parallax probability distribution PiFinding the parallax value corresponding to the maximum probability, and setting the parallax value as dmaxSetting left and right parallax values dlAnd drComprises the following steps:
dl=dmax-v,dr=dmax+v。
step 9, obtaining normalized probability distribution from the left and right parallax values and the parallax probability distributionThe calculation formula is as follows:
step 10, obtaining the final estimated value of each pixel parallax by weighted average operationThe calculation formula is as follows:
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (5)
1. A binocular disparity calculation method based on a 3D convolutional neural network is characterized by comprising the following steps:
s1, constructing a multi-scale feature extraction network structure, and defining a multi-scale feature extraction method according to the structure; the method for the multi-scale feature extraction network comprises the following steps: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features;
s2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step S1, and setting the obtained features as F1、F2;
S3, according to the extracted left and right image characteristics F1And F2Stacking the characteristics of the corresponding parallax positions of the left and right images to obtain 4D costvolume;
s4, based on the constructed 4DcOstvolume, performing cost aggregation by using a 3DCNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain a log-likelihood estimation of a possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L;
s5, calculating the set real distribution according to the real parallax value of the training data; setting the real distribution as Gaussian distribution, and setting the parallax real value as: gt, true distribution is: di, then:
wherein N is a parallax enumeration value, 0< i < N, and v is a preset value;
s6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss;
s7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution Pi;
S8, based on the obtained parallax probability distribution PiFinding the parallax value corresponding to the maximum probability, and setting the parallax value as dmax;
S9, obtaining normalized probability distribution from left and right parallax values and parallax probability distribution
3. The binocular disparity calculation method based on the 3D convolutional neural network of claim 2, wherein in the step S8, left and right disparity values D are set simultaneouslylAnd drComprises the following steps:
dl=dmax-v,dr=dmax+v。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910195328.6A CN110060290B (en) | 2019-03-14 | 2019-03-14 | Binocular parallax calculation method based on 3D convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910195328.6A CN110060290B (en) | 2019-03-14 | 2019-03-14 | Binocular parallax calculation method based on 3D convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110060290A CN110060290A (en) | 2019-07-26 |
CN110060290B true CN110060290B (en) | 2021-06-04 |
Family
ID=67316963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910195328.6A Active CN110060290B (en) | 2019-03-14 | 2019-03-14 | Binocular parallax calculation method based on 3D convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110060290B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110517306B (en) * | 2019-08-30 | 2023-07-28 | 的卢技术有限公司 | Binocular depth vision estimation method and system based on deep learning |
CN111260711B (en) * | 2020-01-10 | 2021-08-10 | 大连理工大学 | Parallax estimation method for weakly supervised trusted cost propagation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103763565A (en) * | 2014-01-24 | 2014-04-30 | 桂林电子科技大学 | Anaglyph coding method based on three-dimensional self-organizing mapping |
CN106157307A (en) * | 2016-06-27 | 2016-11-23 | 浙江工商大学 | A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF |
CN106960415A (en) * | 2017-03-17 | 2017-07-18 | 深圳市唯特视科技有限公司 | A kind of method for recovering image based on pixel-recursive super-resolution model |
CN108140141A (en) * | 2015-08-15 | 2018-06-08 | 易享信息技术有限公司 | Using 3D batches of normalized three-dimensional (3D) convolution |
CN109308719A (en) * | 2018-08-31 | 2019-02-05 | 电子科技大学 | A kind of binocular parallax estimation method based on Three dimensional convolution |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10474160B2 (en) * | 2017-07-03 | 2019-11-12 | Baidu Usa Llc | High resolution 3D point clouds generation from downsampled low resolution LIDAR 3D point clouds and camera images |
-
2019
- 2019-03-14 CN CN201910195328.6A patent/CN110060290B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103763565A (en) * | 2014-01-24 | 2014-04-30 | 桂林电子科技大学 | Anaglyph coding method based on three-dimensional self-organizing mapping |
CN108140141A (en) * | 2015-08-15 | 2018-06-08 | 易享信息技术有限公司 | Using 3D batches of normalized three-dimensional (3D) convolution |
CN106157307A (en) * | 2016-06-27 | 2016-11-23 | 浙江工商大学 | A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF |
CN106960415A (en) * | 2017-03-17 | 2017-07-18 | 深圳市唯特视科技有限公司 | A kind of method for recovering image based on pixel-recursive super-resolution model |
CN109308719A (en) * | 2018-08-31 | 2019-02-05 | 电子科技大学 | A kind of binocular parallax estimation method based on Three dimensional convolution |
Non-Patent Citations (4)
Title |
---|
End-to-end learning of geometry and context for deep stereo regression;Alex Kendall等;《In Proceedings of the IEEE International Conference on Computer Vision》;20171231;第66-75页 * |
Pyramid Stereo Matching Network;Jia-Ren Chang等;《CVPR2018》;20181231;第5410-5418页 * |
基于卷积神经网络的视差图生成技术;朱俊鹏等;《计算机应用》;20180110;第38卷(第1期);第255-259页 * |
基于多尺度卷积神经网络的立体匹配方法;习 路等;《计算机工程与设计》;20180930;第39卷(第9期);第2918-2922页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110060290A (en) | 2019-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108648161B (en) | Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network | |
CN108596961B (en) | Point cloud registration method based on three-dimensional convolutional neural network | |
WO2018000752A1 (en) | Monocular image depth estimation method based on multi-scale cnn and continuous crf | |
CN107767413A (en) | A kind of image depth estimation method based on convolutional neural networks | |
CN113012210B (en) | Method and device for generating depth map, electronic equipment and storage medium | |
CN105956597A (en) | Binocular stereo matching method based on convolution neural network | |
US11367195B2 (en) | Image segmentation method, image segmentation apparatus, image segmentation device | |
WO2022257487A1 (en) | Method and apparatus for training depth estimation model, and electronic device and storage medium | |
WO2022077863A1 (en) | Visual positioning method, and method for training related model, related apparatus, and device | |
CN109509156B (en) | Image defogging processing method based on generation countermeasure model | |
CN110335299B (en) | Monocular depth estimation system implementation method based on countermeasure network | |
CN111160229B (en) | SSD network-based video target detection method and device | |
CN110060290B (en) | Binocular parallax calculation method based on 3D convolutional neural network | |
CN116109753B (en) | Three-dimensional cloud rendering engine device and data processing method | |
CN111553296B (en) | Two-value neural network stereo vision matching method based on FPGA | |
CN108388901B (en) | Collaborative significant target detection method based on space-semantic channel | |
CN115601406A (en) | Local stereo matching method based on fusion cost calculation and weighted guide filtering | |
Vázquez‐Delgado et al. | Real‐time multi‐window stereo matching algorithm with fuzzy logic | |
CN114782714A (en) | Image matching method and device based on context information fusion | |
CN107564045B (en) | Stereo matching method based on gradient domain guided filtering | |
CN111291687B (en) | 3D human body action standard identification method | |
CN117152580A (en) | Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method | |
CN109816710B (en) | Parallax calculation method for binocular vision system with high precision and no smear | |
CN115965961B (en) | Local-global multi-mode fusion method, system, equipment and storage medium | |
CN115908992B (en) | Binocular stereo matching method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |