CN110060290B

CN110060290B - Binocular parallax calculation method based on 3D convolutional neural network

Info

Publication number: CN110060290B
Application number: CN201910195328.6A
Authority: CN
Inventors: 陈创荣; 成慧; 范正平
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2021-06-04
Anticipated expiration: 2039-03-14
Also published as: CN110060290A

Abstract

The invention relates to a binocular disparity calculation method based on a 3D convolutional neural network. The method comprises the following steps: s1, respectively extracting the characteristics of input left and right views according to a defined multi-scale characteristic extraction method; s2, stacking the characteristics of the corresponding parallax positions of the left and right images to obtain a cost volume of 4D; s3, performing cost aggregation by using a 3D CNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling to the resolution of the original image to obtain a log-likelihood estimation of the possible parallax value of each pixel, and performing log-normalization operation to obtain a new log-likelihood estimation; s4, calculating the set real distribution; s5, carrying out reverse propagation training; s6, after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution; s7, finding out a parallax value corresponding to the maximum probability, and S8, obtaining normalized probability distribution according to the left and right parallax values and the parallax probability distribution; and S9, obtaining a final estimated value of each pixel parallax through weighted average operation. The invention can effectively improve the parallax calculation precision.

Description

Binocular parallax calculation method based on 3D convolutional neural network

Technical Field

The invention relates to the field of binocular vision system processing, in particular to a binocular parallax computing method based on a 3D convolutional neural network.

Background

The binocular vision system is used as a low-cost method for obtaining depth, and has important application in multiple fields of robots. The method comprises the steps of drawing, obstacle avoidance, positioning and the like. Particularly, the method has important application in the fields of automatic driving, augmented reality and the like, such as 3D target detection, 3D environment perception and the like. The method has the characteristics of low cost, high robustness, strong anti-interference performance and the like.

Conventional disparity estimation methods typically consist of 4 parts: feature extraction, cost calculation, cost aggregation and parallax optimization. With the development of convolutional neural networks and related hardware, CNN estimation of disparity becomes a better application. There are still many problems such as extraction of multi-scale features, high-precision optimization of parallax, and the like.

Disclosure of Invention

The present invention provides a solution to at least one of the above-mentioned drawbacks of the prior art.

In order to solve the technical problems, the invention adopts the technical scheme that: a binocular disparity calculation method based on a 3D convolutional neural network comprises the following steps:

s1, constructing a multi-scale feature extraction network structure, and defining a multi-scale feature extraction method according to the structure;

s2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step S1, and setting the obtained features as F₁、F₂；

S3, according to the extracted left and right image characteristics F₁And F₂Stacking the characteristics of the corresponding parallax positions of the left and right images to obtain 4D costvolume;

s4, based on the constructed 4D costvolume, performing cost aggregation by using a 3DCNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain a log-likelihood estimation of a possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L;

s5, calculating the set real distribution according to the real parallax value of the training data;

s6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss;

s7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution P_i；

S8, based on the obtained parallax probability distribution P_iFinding the parallax value corresponding to the maximum probability, and setting the parallax value as d_max；

S9. the parallax error between left and right sidesValue and disparity probability distribution to obtain normalized probability distribution

S10, obtaining a final estimated value of each pixel parallax through weighted average operation

Further, the method for the multi-scale feature extraction network in the step S1 includes: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features.

Further, in the step S4, the true distribution is set as gaussian distribution, and the parallax true value is set as: gt, true distribution is: di, then:

wherein, N is a parallax enumeration value, 0< i < N, and v is a preset value.

Further, in the step S7, the disparity probability distribution P_iComprises the following steps:

where N is a disparity enumeration value.

Further, in the step S8, the left and right parallax values d are set simultaneously_lAnd d_rComprises the following steps:

d_l＝d_max-v，d_r＝d_max+v。

further, in the step S9

Comprises the following steps:

further, in the step S10, the final estimation value

The calculation formula of (2) is as follows:

compared with the prior art, the beneficial effects are: the binocular disparity calculation method based on the 3D convolutional neural network can efficiently extract multi-scale features, can train more effectively, and can deduce more accurately, and the disparity calculation scheme has higher precision due to the three advantages.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a multi-scale feature extraction network structure according to the present invention.

FIG. 3 is a diagram illustrating a calculation result according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of the true parallax value in the embodiment of the present invention.

Detailed Description

The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.

As shown in fig. 1, a binocular disparity calculation method based on a 3D convolutional neural network includes the following steps:

step 1, constructing a network structure for multi-scale feature extraction, as shown in fig. 2, defining a multi-scale feature extraction method according to the structure: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features.

Step 2, respectively extracting the features of the input left image and the input right image according to the feature extraction network method provided in the step 1, and setting the obtained features as F₁、F₂；

Step 3, according to the extracted left and right image characteristics F₁And F₂And stacking the characteristics of the corresponding parallax positions of the left and right images to obtain the costvolume of 4D.

And 4, based on the constructed 4Dcostvolume, performing cost aggregation by using a 3DCNN sub-network to obtain the log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain the log-likelihood estimation of the possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L.

Step 5, calculating the set real distribution according to the real parallax value of the training data; here, the true distribution is set as a gaussian distribution, and the true parallax value is set as: gt, true distribution is: di, then:

wherein, N is a parallax enumeration value, 0< i < N, and v is a preset value.

And 6, calculating cross entropy according to the log likelihood estimation and the real distribution to obtain the loss, and performing back propagation training by using the loss.

And 7, local inference: after obtaining the parallax log-likelihood distribution of each pixel, converting the parallax log-likelihood distribution into probability to obtain parallax probability distribution P_iDisparity probability distribution P_iComprises the following steps:

where N is a disparity enumeration value.

Step 8, based on the obtained parallax probability distribution P_iFinding the parallax value corresponding to the maximum probability, and setting the parallax value as d_maxSetting left and right parallax values d_lAnd d_rComprises the following steps:

d_l＝d_max-v，d_r＝d_max+v。

step 9, obtaining normalized probability distribution from the left and right parallax values and the parallax probability distribution

The calculation formula is as follows:

step 10, obtaining the final estimated value of each pixel parallax by weighted average operation

The calculation formula is as follows:

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A binocular disparity calculation method based on a 3D convolutional neural network is characterized by comprising the following steps:

s1, constructing a multi-scale feature extraction network structure, and defining a multi-scale feature extraction method according to the structure; the method for the multi-scale feature extraction network comprises the following steps: the method comprises the steps of obtaining features of one scale by passing an input image through a CNN sub-network each time, setting the step length of each CNN sub-network to be 2, totally having 4 CNN sub-networks, totally extracting sub-features of 1/2, 1/4, 1/8 and 1/16 under 4 scales, inputting the stacked features into another CNN sub-network to obtain the weight of the features of each scale, and finally performing weighting operation on the features of the previous 4 scales by using the weight to obtain the final multi-scale features;

s4, based on the constructed 4DcOstvolume, performing cost aggregation by using a 3DCNN sub-network to obtain a log-likelihood estimation of the parallax value, upsampling the resolution of the original image to obtain a log-likelihood estimation of a possible parallax value of each pixel, and performing log normalization operation to obtain a new log-likelihood estimation, wherein the new log-likelihood estimation is defined as L;

s5, calculating the set real distribution according to the real parallax value of the training data; setting the real distribution as Gaussian distribution, and setting the parallax real value as: gt, true distribution is: di, then:

wherein N is a parallax enumeration value, 0< i < N, and v is a preset value;

S9, obtaining normalized probability distribution from left and right parallax values and parallax probability distribution

2. The binocular disparity calculation method based on the 3D convolutional neural network of claim 1, wherein in the step S7, the disparity probability distribution P_iComprises the following steps:

where N is the disparity enumeration value, L_iIs the ith log-likelihood estimate.

3. The binocular disparity calculation method based on the 3D convolutional neural network of claim 2, wherein in the step S8, left and right disparity values D are set simultaneously_lAnd d_rComprises the following steps:

d_l＝d_max-v，d_r＝d_max+v。

4. the binocular disparity calculation method based on the 3D convolutional neural network as claimed in claim 3, wherein the step S9 is performed

Comprises the following steps:

5. the binocular disparity calculation method based on the 3D convolutional neural network of claim 4, wherein in the step S10, the final estimation value

The calculation formula of (2) is as follows: