CN110060290B - Binocular parallax calculation method based on 3D convolutional neural network - Google Patents

Binocular parallax calculation method based on 3D convolutional neural network Download PDF

Info

Publication number
CN110060290B
CN110060290B CN201910195328.6A CN201910195328A CN110060290B CN 110060290 B CN110060290 B CN 110060290B CN 201910195328 A CN201910195328 A CN 201910195328A CN 110060290 B CN110060290 B CN 110060290B
Authority
CN
China
Prior art keywords
parallax
log
distribution
value
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910195328.6A
Other languages
Chinese (zh)
Other versions
CN110060290A (en
Inventor
陈创荣
成慧
范正平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910195328.6A priority Critical patent/CN110060290B/en
Publication of CN110060290A publication Critical patent/CN110060290A/en
Application granted granted Critical
Publication of CN110060290B publication Critical patent/CN110060290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a binocular disparity calculation method based on a 3D convolutional neural network. The method comprises the following steps: s1, respectively extracting the characteristics of input left and right views according to a defined multi-scale characteristic extraction method; s2, stacking the characteristics of the corresponding parallax positions of the left and right images to obtain a cost volume of 4D; s3, performing cost aggregation by using a 3D CNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling to the resolution of the original image to obtain a log-likelihood estimation of the possible parallax value of each pixel, and performing log-normalization operation to obtain a new log-likelihood estimation; s4, calculating the set real distribution; s5, carrying out reverse propagation training; s6, after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution; s7, finding out a parallax value corresponding to the maximum probability, and S8, obtaining normalized probability distribution according to the left and right parallax values and the parallax probability distribution; and S9, obtaining a final estimated value of each pixel parallax through weighted average operation. The invention can effectively improve the parallax calculation precision.

Description

Binocular parallax calculation method based on 3D convolutional neural network
Technical Field
The invention relates to the field of binocular vision system processing, in particular to a binocular parallax computing method based on a 3D convolutional neural network.
Background
The binocular vision system is used as a low-cost method for obtaining depth, and has important application in multiple fields of robots. The method comprises the steps of drawing, obstacle avoidance, positioning and the like. Particularly, the method has important application in the fields of automatic driving, augmented reality and the like, such as 3D target detection, 3D environment perception and the like. The method has the characteristics of low cost, high robustness, strong anti-interference performance and the like.
Conventional disparity estimation methods typically consist of 4 parts: feature extraction, cost calculation, cost aggregation and parallax optimization. With the development of convolutional neural networks and related hardware, CNN estimation of disparity becomes a better application. There are still many problems such as extraction of multi-scale features, high-precision optimization of parallax, and the like.
Disclosure of Invention
The present invention provides a solution to at least one of the above-mentioned drawbacks of the prior art.
In order to solve the technical problems, the invention adopts the technical scheme that: a binocular disparity calculation method based on a 3D convolutional neural network comprises the following steps:
s1, constructing a multi-scale feature extraction network structure, and defining a multi-scale feature extraction method according to the structure;
s2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step S1, and setting the obtained features as F1、F2
S3, according to the extracted left and right image characteristics F1And F2Stacking the characteristics of the corresponding parallax positions of the left and right images to obtain 4D costvolume;
s4, based on the constructed 4D costvolume, performing cost aggregation by using a 3DCNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain a log-likelihood estimation of a possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L;
s5, calculating the set real distribution according to the real parallax value of the training data;
s6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss;
s7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution Pi
S8, based on the obtained parallax probability distribution PiFinding the parallax value corresponding to the maximum probability, and setting the parallax value as dmax
S9. the parallax error between left and right sidesValue and disparity probability distribution to obtain normalized probability distribution
Figure BDA0001995638190000021
S10, obtaining a final estimated value of each pixel parallax through weighted average operation
Figure BDA0001995638190000022
Further, the method for the multi-scale feature extraction network in the step S1 includes: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features.
Further, in the step S4, the true distribution is set as gaussian distribution, and the parallax true value is set as: gt, true distribution is: di, then:
Figure BDA0001995638190000023
wherein, N is a parallax enumeration value, 0< i < N, and v is a preset value.
Further, in the step S7, the disparity probability distribution PiComprises the following steps:
Figure BDA0001995638190000024
where N is a disparity enumeration value.
Further, in the step S8, the left and right parallax values d are set simultaneouslylAnd drComprises the following steps:
dl=dmax-v,dr=dmax+v。
further, in the step S9
Figure BDA0001995638190000025
Comprises the following steps:
Figure BDA0001995638190000031
further, in the step S10, the final estimation value
Figure BDA0001995638190000032
The calculation formula of (2) is as follows:
Figure BDA0001995638190000033
compared with the prior art, the beneficial effects are: the binocular disparity calculation method based on the 3D convolutional neural network can efficiently extract multi-scale features, can train more effectively, and can deduce more accurately, and the disparity calculation scheme has higher precision due to the three advantages.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a multi-scale feature extraction network structure according to the present invention.
FIG. 3 is a diagram illustrating a calculation result according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of the true parallax value in the embodiment of the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
As shown in fig. 1, a binocular disparity calculation method based on a 3D convolutional neural network includes the following steps:
step 1, constructing a network structure for multi-scale feature extraction, as shown in fig. 2, defining a multi-scale feature extraction method according to the structure: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features.
Step 2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step 1, and setting the obtained features as F1、F2
Step 3, according to the extracted left and right image characteristics F1And F2And stacking the characteristics of the corresponding parallax positions of the left and right images to obtain the costvolume of 4D.
And 4, based on the constructed 4Dcostvolume, performing cost aggregation by using a 3DCNN sub-network to obtain the log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain the log-likelihood estimation of the possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L.
Step 5, calculating the set real distribution according to the real parallax value of the training data; here, the true distribution is set as a gaussian distribution, and the true parallax value is set as: gt, true distribution is: di, then:
Figure BDA0001995638190000041
wherein, N is a parallax enumeration value, 0< i < N, and v is a preset value.
And 6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss.
And 7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution PiDisparity probability distribution PiComprises the following steps:
Figure BDA0001995638190000042
where N is a disparity enumeration value.
Step 8, based on the obtained parallax probability distribution PiFinding the parallax value corresponding to the maximum probability, and setting the parallax value as dmaxSetting left and right parallax values dlAnd drComprises the following steps:
dl=dmax-v,dr=dmax+v。
step 9, obtaining normalized probability distribution from the left and right parallax values and the parallax probability distribution
Figure BDA0001995638190000043
The calculation formula is as follows:
Figure BDA0001995638190000051
step 10, obtaining the final estimated value of each pixel parallax by weighted average operation
Figure BDA0001995638190000052
The calculation formula is as follows:
Figure BDA0001995638190000053
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (5)

1. A binocular disparity calculation method based on a 3D convolutional neural network is characterized by comprising the following steps:
s1, constructing a multi-scale feature extraction network structure, and defining a multi-scale feature extraction method according to the structure; the method for the multi-scale feature extraction network comprises the following steps: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features;
s2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step S1, and setting the obtained features as F1、F2
S3, according to the extracted left and right image characteristics F1And F2Stacking the characteristics of the corresponding parallax positions of the left and right images to obtain 4D costvolume;
s4, based on the constructed 4DcOstvolume, performing cost aggregation by using a 3DCNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain a log-likelihood estimation of a possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L;
s5, calculating the set real distribution according to the real parallax value of the training data; setting the real distribution as Gaussian distribution, and setting the parallax real value as: gt, true distribution is: di, then:
Figure FDA0002945546840000011
wherein N is a parallax enumeration value, 0< i < N, and v is a preset value;
s6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss;
s7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution Pi
S8, based on the obtained parallax probability distribution PiFinding the parallax value corresponding to the maximum probability, and setting the parallax value as dmax
S9, obtaining normalized probability distribution from left and right parallax values and parallax probability distribution
Figure FDA0002945546840000012
S10, obtaining a final estimated value of each pixel parallax through weighted average operation
Figure FDA0002945546840000013
2. The binocular disparity calculation method based on the 3D convolutional neural network of claim 1, wherein in the step S7, the disparity probability distribution PiComprises the following steps:
Figure FDA0002945546840000021
where N is the disparity enumeration value, LiIs the ith log-likelihood estimate.
3. The binocular disparity calculation method based on the 3D convolutional neural network of claim 2, wherein in the step S8, left and right disparity values D are set simultaneouslylAnd drComprises the following steps:
dl=dmax-v,dr=dmax+v。
4. the binocular disparity calculation method based on the 3D convolutional neural network as claimed in claim 3, wherein the step S9 is performed
Figure FDA0002945546840000022
Comprises the following steps:
Figure FDA0002945546840000023
5. the binocular disparity calculation method based on the 3D convolutional neural network of claim 4, wherein in the step S10, the final estimation value
Figure FDA0002945546840000024
The calculation formula of (2) is as follows:
Figure FDA0002945546840000025
CN201910195328.6A 2019-03-14 2019-03-14 Binocular parallax calculation method based on 3D convolutional neural network Active CN110060290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910195328.6A CN110060290B (en) 2019-03-14 2019-03-14 Binocular parallax calculation method based on 3D convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910195328.6A CN110060290B (en) 2019-03-14 2019-03-14 Binocular parallax calculation method based on 3D convolutional neural network

Publications (2)

Publication Number Publication Date
CN110060290A CN110060290A (en) 2019-07-26
CN110060290B true CN110060290B (en) 2021-06-04

Family

ID=67316963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910195328.6A Active CN110060290B (en) 2019-03-14 2019-03-14 Binocular parallax calculation method based on 3D convolutional neural network

Country Status (1)

Country Link
CN (1) CN110060290B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517306B (en) * 2019-08-30 2023-07-28 的卢技术有限公司 Binocular depth vision estimation method and system based on deep learning
CN111260711B (en) * 2020-01-10 2021-08-10 大连理工大学 Parallax estimation method for weakly supervised trusted cost propagation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103763565A (en) * 2014-01-24 2014-04-30 桂林电子科技大学 Anaglyph coding method based on three-dimensional self-organizing mapping
CN106157307A (en) * 2016-06-27 2016-11-23 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN106960415A (en) * 2017-03-17 2017-07-18 深圳市唯特视科技有限公司 A kind of method for recovering image based on pixel-recursive super-resolution model
CN108140141A (en) * 2015-08-15 2018-06-08 易享信息技术有限公司 Using 3D batches of normalized three-dimensional (3D) convolution
CN109308719A (en) * 2018-08-31 2019-02-05 电子科技大学 A kind of binocular parallax estimation method based on Three dimensional convolution

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474160B2 (en) * 2017-07-03 2019-11-12 Baidu Usa Llc High resolution 3D point clouds generation from downsampled low resolution LIDAR 3D point clouds and camera images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103763565A (en) * 2014-01-24 2014-04-30 桂林电子科技大学 Anaglyph coding method based on three-dimensional self-organizing mapping
CN108140141A (en) * 2015-08-15 2018-06-08 易享信息技术有限公司 Using 3D batches of normalized three-dimensional (3D) convolution
CN106157307A (en) * 2016-06-27 2016-11-23 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN106960415A (en) * 2017-03-17 2017-07-18 深圳市唯特视科技有限公司 A kind of method for recovering image based on pixel-recursive super-resolution model
CN109308719A (en) * 2018-08-31 2019-02-05 电子科技大学 A kind of binocular parallax estimation method based on Three dimensional convolution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
End-to-end learning of geometry and context for deep stereo regression;Alex Kendall等;《In Proceedings of the IEEE International Conference on Computer Vision》;20171231;第66-75页 *
Pyramid Stereo Matching Network;Jia-Ren Chang等;《CVPR2018》;20181231;第5410-5418页 *
基于卷积神经网络的视差图生成技术;朱俊鹏等;《计算机应用》;20180110;第38卷(第1期);第255-259页 *
基于多尺度卷积神经网络的立体匹配方法;习 路等;《计算机工程与设计》;20180930;第39卷(第9期);第2918-2922页 *

Also Published As

Publication number Publication date
CN110060290A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
CN108648161B (en) Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network
CN108596961B (en) Point cloud registration method based on three-dimensional convolutional neural network
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN107767413A (en) A kind of image depth estimation method based on convolutional neural networks
CN113012210B (en) Method and device for generating depth map, electronic equipment and storage medium
CN105956597A (en) Binocular stereo matching method based on convolution neural network
US11367195B2 (en) Image segmentation method, image segmentation apparatus, image segmentation device
WO2022257487A1 (en) Method and apparatus for training depth estimation model, and electronic device and storage medium
WO2022077863A1 (en) Visual positioning method, and method for training related model, related apparatus, and device
CN109509156B (en) Image defogging processing method based on generation countermeasure model
CN110335299B (en) Monocular depth estimation system implementation method based on countermeasure network
CN111160229B (en) SSD network-based video target detection method and device
CN110060290B (en) Binocular parallax calculation method based on 3D convolutional neural network
CN116109753B (en) Three-dimensional cloud rendering engine device and data processing method
CN111553296B (en) Two-value neural network stereo vision matching method based on FPGA
CN108388901B (en) Collaborative significant target detection method based on space-semantic channel
CN115601406A (en) Local stereo matching method based on fusion cost calculation and weighted guide filtering
Vázquez‐Delgado et al. Real‐time multi‐window stereo matching algorithm with fuzzy logic
CN114782714A (en) Image matching method and device based on context information fusion
CN107564045B (en) Stereo matching method based on gradient domain guided filtering
CN111291687B (en) 3D human body action standard identification method
CN117152580A (en) Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method
CN109816710B (en) Parallax calculation method for binocular vision system with high precision and no smear
CN115965961B (en) Local-global multi-mode fusion method, system, equipment and storage medium
CN115908992B (en) Binocular stereo matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant