CN105956597A

CN105956597A - Binocular stereo matching method based on convolution neural network

Info

Publication number: CN105956597A
Application number: CN201610296770.4A
Authority: CN
Inventors: 刘云海; 白鹏
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2016-05-04
Filing date: 2016-05-04
Publication date: 2016-09-21

Abstract

The invention discloses a binocular stereo matching method based on a convolution neural network. First, two convolution neural sub networks are adopted to extract the features of image blocks to be matched, robust and diverse features can be extracted automatically through the automatic learning ability of the convolution neural network, and the complex feature selection and artificial feature extraction process of the traditional stereo matching method is avoided. Then, the output features are sent in a cascaded manner to a full connection layer for matching cost calculation, and better matching cost is obtained compared with the traditional stereo matching method. Through the binocular stereo matching method and a parallax post-processing method, a high-precision parallax map can be obtained effectively, and good real-time performance is achieved.

Description

A kind of binocular solid matching process based on convolutional neural networks

Technical field

The present invention relates to binocular stereo vision technical field of image processing, particularly relate to a kind of binocular using convolutional neural networks and stand Body matching process.

Background technology

From the beginning of the eighties in last century, since Marr has founded theory of vision computing framework, technique of binocular stereoscopic vision is always machine One study hotspot of visual field, has obtained extensively in fields such as aerial mapping, medical imaging, virtual reality and industrial detection Research.Binocular stereo vision is based on principle of parallax and to utilize imaging device to obtain two width images of testee from diverse location, By calculating the position deviation between image corresponding point, the method obtaining the three-dimensional geometric information of object.Binocular stereo vision algorithm Mainly include five parts such as Image Acquisition, camera calibration, image rectification, Stereo matching and three-dimensional reconstruction.Its neutral body Joining is the core of whole algorithm, and the quality of the disparity map that coupling produces directly influences the effect of three-dimensional reconstruction.At present, pass The method of Stereo matching of system is broadly divided into three major types: feature-based matching algorithm, matching algorithm based on local and based on entirely The matching algorithm of office.What feature-based matching algorithm obtained is sparse disparity map, will obtain the disparity map of densification, it is necessary to logical Cross interpolation to obtain.Matching algorithm fast operation based on local, but poor at low texture and degree of depth discontinuity zone matching effect. Matching algorithm based on the overall situation can obtain the matching result of degree of precision, but it is slow to calculate speed.

Summary of the invention

In order to obtain high-precision fine and close disparity map and preferable real-time, the invention provides a kind of based on convolutional neural networks Binocular solid matching process.

It is an object of the invention to be achieved through the following technical solutions: a kind of binocular solid match party based on convolutional neural networks Method, comprises the following steps:

(1) Image semantic classification.Left figure and right figure with the stereo pairs with reference to disparity map is done Z-score standardization respectively.

(2) structure training example.From pretreated left figure Selection Center p=(x, y), size be the small images of n × nFrom pretreated right figure Selection Center q=(x-d, y), size be the small images of n × nWithConstitute one training example:

Know the position with reference to parallax value d for left figure is each, extract a correct training example and a wrong training example.

In order to obtain a correct training example, by right small imagesBe centrally disposed in:

Q=(x-d+o_pos,y)

Wherein o_posRandom value in [-dataset_pos, dataset_pos], dataset_pos is positive integer.

In order to obtain a wrong training example, by right small imagesBe centrally disposed in:

Q=(x-d+o_neg,y)

Wherein o_negAt [-dataset_neg_low ,-dataset_neg_high] or [dataset_neg_low, dataset_neg_high] Middle random value.Dataset_neg_low and dataset_neg_high is positive integer.

(3) it is configured to calculate the convolutional neural networks structure of Matching power flow.First two duplicate sub-networks of structure, every height Network is by two convolutional layers and a full articulamentum, and every layer is followed by a ReLU layer.Then by the output stage of two sub-networks Connection gets up, and connects two full articulamentums, and every layer connects a ReLU layer below, and last full articulamentum is followed a sigmoid and turned Move function.Each is inputtedThe output of network is expressed as

(4) training network.According to step (2), what structure N/2 was correct every time trains example and the training example of N/2 mistake, Using it for the network that step (3) constructs carries out having the back-propagation algorithm of supervision to train, and obtains training network, and N is training set Number.

(5) disparity map is sought.From test set, take one group of stereo pairs, carry out the pretreatment of step (1).Use step (4) instruction The network practised, for each position p=in left figure, (x y), calculates it with right figure in position q=(x-d, coupling y) Cost C_CNN(p, d), wherein d ∈ (0, DISP_MAX) (the maximum disparity value that DISP_MAX expresses possibility), obtain:

For each position p=in left figure, (x, y), the position d when Matching power flow in above formula takes minimum is required parallax D (p):

D (p) = \arg \min_{d} C_{C N N} (p, d)

(6) disparity map is carried out post processing.Specifically include following sub-step:

(6.1) sub-pix parallax.Matching power flow one conic section of structure obtained according to step (5), takes extreme point and obtains sub-picture Element disparity map D_SE(p):

D_{S E} (p) = d - \frac{C_{+} - C_{-}}{2 (C_{+} - 2 C + C_{-})},

Wherein d=D (p), C_-=C_CNN(p, d-1), C=C_CNN(p, d), C₊=C_CNN(p,d+1)；

(6.2) to sub-pix disparity map D_SEP () carries out medium filtering and bilinear filter, obtain final disparity map D_final(p)。

Further, in described step 1, described Z-score course of standardization process is specific as follows:

Calculate average x of all pixel values in image X_averageAnd standard deviation sigma:

x_{a v e r a g e} = \frac{1}{W \times H} \underset{(i, j) &Element; W \times H}{Σ} x_{(i, j)}

σ = \sqrt{\frac{1}{W \times H} \underset{(i, j) &Element; W \times H}{Σ} {(x_{(i, j)} - x_{a v e r a g e})}^{2}}

Wherein, W × H is the size of image X.

Being normalized each pixel value, obtain new images X ', its pixel value is:

x_{(i, j)}^{'} &LeftArrow; \frac{x_{(i, j)} - x_{a v e r a g e}}{σ} .

Further, in described step 4, the cost function two-value cross entropy loss function of described training network:

- \frac{1}{N} Σ_{i = 1}^{N} (l o g (p_{i}) y_{i} + l o g (1 - p_{i}) (1 - y_{i}))

Wherein, N is the number of training set, y_iFor the label value of i-th sample, p_iPredictive value for i-th sample.

Further, in described step 4, in the cost function of described training network, when i-th example is correct example, mark Label value is 0；When i-th example is error instance, label value is 1.

The invention has the beneficial effects as follows: the present invention carries out spy initially with two convolutional Neural sub-networks to image fritter to be matched Levy extraction, by the automatic learning capacity of convolutional neural networks, it is possible to automatically extract out stalwartness, various feature, it is to avoid pass The complicated feature selection of system solid matching method and the process of artificial extraction feature.Then their output characteristic is cascaded It is sent to full articulamentum and carries out Matching power flow calculating, it is thus achieved that be more more preferable Matching power flow than conventional stereo matching method, in conjunction with some The post-processing approach of parallax, effectively obtains high-precision disparity map, and has preferable real-time.

Accompanying drawing explanation

Fig. 1 is the schematic diagram of structure training example；

Fig. 2 is the convolutional neural networks structural representation of the Matching power flow for calculating point to be matched；

Fig. 3 is the schematic diagram that conic section seeks extreme point.

Detailed description of the invention

Below in conjunction with accompanying drawing and embodiment, the invention will be further described.

A kind of based on convolutional neural networks the binocular solid matching process that the present invention provides, comprises the following steps:

(1) Image semantic classification.10 groups of left figures with the stereo pairs with reference to disparity map and right figure are done Z-score standard respectively Change processes: calculate average x of all pixel values in image respectively_averageAnd standard deviation sigma, such as pretreatment done for image X:

x_{a v e r a g e} = \frac{1}{380 \times 430} \underset{(i, j) &Element; 9 \times 9}{Σ} x_{(i, j)} = 165

σ = \sqrt{\frac{1}{380 \times 430} \underset{(i, j) &Element; 380 \times 430}{Σ} {(x_{(i, j)} - 165)}^{2}} = 1.23

Being normalized each pixel value, obtain new images X ', its pixel value is:

x_{(i, j)}^{'} &LeftArrow; \frac{x_{(i, j)} - 165}{1.23}

(2) structure training example.From pretreated left figure Selection Center p=(x, y), size be 9 × 9 a small imagesFrom pretreated right figure Selection Center q=(x-d, y), size be 9 × 9 a small imagesWithConstitute a training example, as shown in Figure 1:

Knowing the position with reference to parallax value d for each, we extract a correct training example and a wrong training is real Example.In order to obtain a correct training example, by right small imagesBe centrally disposed in:

Q=(x-d+o_pos,y)

Wherein o_posIt it is random value in [-0.5,0.5].

Q=(x-d+o_neg,y)

Wherein o_negIt it is random value in [-1.5 ,-18] or [1.5,18].

(3) it is configured to calculate the convolutional neural networks structure of Matching power flow.First two duplicate sub-networks of structure, every height Network is by two convolutional layers and a full articulamentum, and every layer is followed by a ReLU layer.The size of convolution kernel is 3 × 3, often One layer has 32 convolution kernels, and full articulamentum has 200 unit.Then the output cascade of two sub-networks is got up, obtain one The vector of a length of 400.Then connecting two full articulamentums, every layer connects a ReLU layer below, and each full articulamentum has 300 Individual unit.Finally connect the full articulamentum of only one of which unit, follow a sigmoid transfer function.At sigmoid The output of reason is exactly the output result of network.As shown in Figure 2: each is inputtedNetwork defeated Go out to be expressed as:

(4) training network.According to step (2), what structure 64 was correct every time trains example and 64 wrong training examples, its Corresponding output Y_label=[y_label(1),y_label(2),…,y_label(128) ,], wherein i-th training example label should meet with Lower condition:

Using it for the network that step (3) constructs carries out having the back-propagation algorithm of supervision to train, with two-value cross entropy loss function Counting loss cost is:

- \frac{1}{128} Σ_{i = 1}^{128} (l o g (y_{i}) y_{l a b e l} (i) + l o g (1 - y_{i}) (1 - y_{l a b e l} (i)))

Wherein y_iFor the output valve that i-th sample is corresponding.

(5) disparity map is sought.From test set, take a pair image, carry out the pretreatment of step (1).Step (4) is used to train out Network, for each position p=in left figure (x, y), calculate its with right figure position q=(x-d, Matching power flow y), Wherein d ∈ (0,30) (the maximum disparity value that 30 express possibility), obtains:

For each position p=in left figure, (x, y), the position d when Matching power flow in above formula is taken minimum is required parallax D (p):

D (p) = \arg \min_{d} C (p, d)

(6.1) sub-pix parallax.Matching power flow one conic section of structure obtained according to step (5), as it is shown on figure 3, take pole Value point can be with sub-pix parallax D_SE(p):

D_{S E} (p) = d - \frac{C_{+} - C_{-}}{2 (C_{+} - 2 C + C_{-})},

Wherein d=D (p), C_-=C_CNN(p, d-1), C=C_CNN(p, d), C₊=C_CNN(p,d+1)。

(6.2) to disparity map D_SEP () carries out medium filtering and bilinear filter, obtain final disparity map D_final(p)。

The foregoing is only the preferred embodiment of the present invention, but scope is not limited thereto.Any this area Technical staff, in technical scope disclosed by the invention, all can carry out suitable being altered or varied to it, and this is altered or varied All should contain within protection scope of the present invention.

Claims

1. a binocular solid matching process based on convolutional neural networks, it is characterised in that comprise the following steps:

Q=(x-d+o_pos,y)

Q=(x-d+o_neg,y)

D (p) = \arg \underset{d}{m i n} C_{C N N} (p, d)

D_{S E} (p) = d - \frac{C_{+} - C_{-}}{2 (C_{+} - 2 C + C_{-})},

Wherein d=D (p), C_=C_CNN(p, d-1), C=C_CNN(p, d), C₊=C_CNN(p,d+1)；

A kind of binocular solid matching process based on convolutional neural networks the most according to claim 1, it is characterised in that described step In rapid 1, described Z-score course of standardization process is specific as follows:

x_{a v e r a g e} = \frac{1}{W \times H} \underset{(i, j) &Element; W \times H}{Σ} x_{(i, j)}

σ = \sqrt{\frac{1}{W \times H} \underset{(i, j) &Element; W \times H}{Σ} {(x_{(i, j)} - x_{a v e r a g e})}^{2}}

Wherein, W × H is the size of image X.

Being normalized each pixel value, obtain new images X ', its pixel value is:

x_{(i, j)}^{'} &LeftArrow; \frac{x_{(i, j)} - x_{a v e r a g e}}{σ} .

A kind of binocular solid matching process based on convolutional neural networks the most according to claim 1, it is characterised in that described step In rapid 4, the cost function two-value cross entropy loss function of described training network:

- \frac{1}{N} Σ_{i = 1}^{N} (\log (p_{i}) y_{i} + l o g (1 - p_{i}) (1 - y_{i}))

A kind of binocular solid matching process based on convolutional neural networks the most according to claim 1, it is characterised in that described step In rapid 4, in the cost function of described training network, when i-th example is correct example, label value is 0；When i-th is real When example is error instance, label value is 1.