CN107506711B

CN107506711B - Convolutional neural network-based binocular vision barrier detection system and method

Info

Publication number: CN107506711B
Application number: CN201710697239.2A
Authority: CN
Inventors: 马国军; 胡颖; 夏健; 卫春军; 郑威
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2017-08-15
Filing date: 2017-08-15
Publication date: 2020-06-30
Anticipated expiration: 2037-08-15
Also published as: CN107506711A

Abstract

The invention discloses a binocular vision barrier detection system and method based on a convolutional neural network. The system consists of an image acquisition module and an obstacle detection module, wherein the image acquisition module acquires binocular images and transmits the binocular images to the obstacle detection module to perform corresponding data processing on the acquired image data so as to obtain an accurate obstacle region. The detection method comprises the following steps: firstly, performing median filtering processing on an acquired original image; correcting the binocular image according to the camera parameters; designing a new convolution kernel to be applied to a convolution neural network structure for generating an accurate disparity map; finally, the precise obstacle area in the image is detected by using an improved V parallax method. The invention has good obstacle detection precision under the conditions of complex light, small obstacles and the like, and has good robustness.

Description

Convolutional neural network-based binocular vision barrier detection system and method

Technical Field

The invention relates to the technical field of binocular vision image processing, in particular to a binocular vision barrier detection system based on a convolutional neural network and a detection method thereof.

Background

With the progress of computer technology, intelligent vehicles are rapidly developed and widely applied to the fields of national defense, scientific research, daily life and the like. Among them, obstacle detection is a core problem in intelligent vehicle navigation.

The convolutional neural network is a feed-forward network, and the artificial neuron can respond to units in a coverage range, including convolutional layers and pooling layers, and has excellent performance on image processing. In recent years, people have attracted more and more attention, and the application field is more and more extensive.

At present, a binocular vision-based obstacle detection method gradually becomes a hot problem of research, the same scene in binocular images is searched through binocular matching, a disparity relation graph of a changed space scene is generated, and a specific detection method is applied to process the disparity graph to obtain a corresponding obstacle detection area.

However, when complex light and small obstacles appear in the scene, it is difficult for the existing method to achieve accurate obstacle region detection and maintain the robustness of the system. Therefore, for the complex environment of the smart car, how to accurately detect the obstacle area and maintain the robustness of the system becomes an urgent problem to be solved.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a binocular vision barrier detection system based on a convolutional neural network and a detection method thereof, so as to solve the defects of the conventional binocular vision barrier detection method in the aspects of barrier detection accuracy, system robustness, etc.

In order to achieve the purpose, the invention adopts the following technical scheme:

a binocular vision obstacle detection system based on a convolutional neural network is composed of an image acquisition module and an obstacle detection module which are connected. The image acquisition module is used for acquiring a left image and a right image in a scene, wherein the left image and the right image are both optical images. The obstacle detection module is used for carrying out corresponding data processing on the acquired image data to acquire a final accurate obstacle area.

Furthermore, the image acquisition modules are two industrial cameras which are horizontally arranged in parallel and have the model of Pike F-100, and image data are transmitted to a computer for subsequent processing through an IEEE-1394b interface by an acquisition card.

Further, the obstacle detection module is a computer equipped with a GPU of NVIDIAGTX 1070.

In order to achieve the above object, the present invention is achieved by the following another technical means.

A detection method of a binocular vision barrier detection system based on a convolutional neural network comprises the following steps:

(1) acquiring a binocular image from an image acquisition module, and preprocessing the binocular image to eliminate noise in the image;

(2) the calibration image acquisition module acquires parameters of internal parameters and distortion of the image acquisition module and corrects the binocular image;

(3) designing a twin convolution neural network to generate a disparity map of an accurate binocular image;

(4) the disparity map is processed using an improved V-disparity method to detect an obstacle region in the image.

The specific method for generating the accurate binocular image disparity map by designing the twin convolutional neural network in the step (3) comprises the following steps of:

a) and designing a twin convolutional neural network structure, wherein the left and right branch parameters are shared. The twin convolutional neural network is composed of a feature extraction sub-network and a feature classification sub-network. The left branch and the right branch of the feature extraction sub-network can respectively extract corresponding feature descriptions from the input image square and the input image long strip; and the feature classification sub-network performs dot product operation on the extracted left and right branch feature descriptions to obtain the similarity score of the pixel points to be matched in the parallax search range, and then the similarity score is used as the input of the softmax layer to obtain the parallax probability distribution.

The convolution kernel is designed to have the same receptive field as the traditional convolution kernel (2n-1) × (2n-1) parameters are reduced to (2n-3) × (2n-3) parameters, the calculated amount is reduced, and the overfitting phenomenon can be overcome by less parameter amount.

b) And (6) cutting a sample. Selecting a pixel point p (x) with real parallax in the left image according to real parallax data in the KITTI data set_i,y_i) Extracting an image block with the pixel point as a center; selecting a pixel point q in the right image, and enabling the coordinate to be (x)_i,y_i) And selecting image blocks of the same size with q as the center. According to the parallax search range, selecting the left boundary of the right image block with q as the center in the right imageSide image stripes. Sample cropping is accomplished in this way.

c) And standardizing sample data and constructing a training set. And (b) transforming the gray value of the image block clipped in the step b) into a range of [ -1,1] by using a formula (1) as an input of the network.

Wherein,

in the formula, U is an image block after standardization; x is a cropped original image block;

is the average of the image blocks X; x is the number of_iIs the pixel value in image block X; s is the standard deviation of the image block X; n is the number of pixels in the image block.

d) And training the twin convolutional neural network. Training the twin convolutional neural network designed by step a) with an adaptive moment-estimated stochastic gradient descent algorithm (Adam). According to the application scenario of the invention, the mutual entropy loss function is modified as follows:

wherein,

where j (ω) represents the mutual entropy loss for a set of samples; p is a radical of_i(d_iω) represents the probability distribution of the ith sample; d_iIs a predicted disparity value;

is the true disparity value; lambda [ alpha ]₁,λ₂,λ₃Is a preset value. In the present invention, λ is set individually₁＝0.5,λ₂＝0.2,λ₃＝0.05。

e) And (5) calculating a disparity map. Taking binocular image pairs and standardizing the binocular image pairs by using the step c). Extracting 64-dimensional features of pixel points in the left image and the right image by using the network model trained in the step d), and respectively recording the features as S_L(p) and S_R(q); will S_L(p) and S_R(q) performing dot product operation to obtain an image pair similarity score, and taking an inverse number as a matching cost:

C_CNN(p,d)＝-s(<S_L(p),S_R(q)>) (6)

wherein s: (<S_L(p),S_R(q)>) Representing the image pair similarity score. And finally, in the parallax searching range, selecting the point with the minimum matching cost as the matching point for parallax selection, and further generating a parallax map.

The specific method for detecting an obstacle region in an image by processing the disparity map obtained in step (3) with the improved V-disparity method in step (4) above includes:

f) and (3) calculating the gradient of the parallax image array direction generated in the step (4) by adopting a Prewitt operator, reserving the parallax at the position where the gradient is a negative value, setting the parallax at other positions to be 0, filtering out the pixel points of the obstacle, and generating the filtered parallax image.

g) Counting the same gray value number of each line of pixels of the filtered disparity map generated in the step f) to generate a V disparity map I₁。

h) Search for V disparity map I₁The maximum value of each row in the image is reserved, the gray value of the rest pixel points is set to be 0, and a maximum value V parallax image I is generated₂。

i) Setting a threshold value T, and₂setting the gray value of the pixel larger than T to be 1, and setting the rest to be 0, and generating a V parallax binary image I only containing road information₃The calculation formula of the threshold value T is as follows:

in the formula, x_iIs I₂The value of each pixel; n is I₂The total number of middle pixels; 1{ x_iNot equal to 0} denotes x_iAnd if not, taking 1, otherwise, taking 0.

j) Extraction of I by hough line detection method₃And (5) making the middle road straight.

k) And calculating the slope k and the intercept b of the image coordinate system of the road straight line extracted in the step j) in the V disparity map by using a straight line two-point equation.

l) the disparity map D (x, y, D) is scanned point by point from bottom to top from left to right in accordance with a raster scan method, and f ═ kd + b is calculated for each pixel point.

m) if (f-y)>T₁And projecting the pixel point in the disparity map above a straight line of the road surface in the V disparity map, namely, being higher than the road surface, and taking the pixel point as an obstacle point. The embodiment of the invention takes the threshold value T₁＝5。

And n) repeating the steps l) and m) until the parallax image is completely scanned, and obtaining the obstacle area in the image.

The invention has the following advantages and beneficial effects:

(1) the method uses the convolutional neural network to calculate the disparity map, has good robustness under the conditions of complex light, small obstacles and the like, and can obtain the accurate disparity map; therefore, accurate barrier regions can be obtained and good robustness can be achieved through the improved V parallax method.

(2) Compared with the traditional convolution kernel, the convolution kernel has the same receptive field as the traditional convolution kernel, and simultaneously greatly reduces the parameter quantity, thereby reducing the calculated quantity and being beneficial to overcoming the overfitting phenomenon by less parameter quantity.

(3) The invention improves the V parallax method for detecting the obstacle area, can effectively and stably extract the road straight line in the V parallax image, further realizes the obstacle area detection in the image, and has higher detection precision and detection robustness.

Drawings

Figure 1 is a schematic view of the obstacle detection system disclosed in the present invention,

figure 2 is a general flow chart of the obstacle detection method disclosed in the present invention,

figure 3 is a flow chart of a portion of a design twin convolutional neural network to generate a disparity map,

figure 4 is a flow chart of a portion of an improved V-parallax method for detecting obstacles,

figure 5 is a schematic diagram of the structure of a convolutional neural network designed by the present invention,

fig. 6 is a schematic diagram of a new convolution kernel structure designed by the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

Referring to fig. 1, a schematic structural diagram of a binocular vision barrier detection system based on a convolutional neural network according to the present invention is shown. This binocular vision obstacle detection system based on convolutional neural network includes: the device comprises an image acquisition module and an obstacle detection module. In the embodiment, two industrial cameras which are horizontally arranged in parallel and have the models of PikeF-100 are adopted in the image acquisition module for acquiring left and right optical images in an actual scene, and an IEEE-1394b interface is used for transmitting image data to a computer through an acquisition card for subsequent processing. In the embodiment of the invention, the obstacle detection module adopts a computer provided with a GPU of NVIDIAGTX1070 to process the acquired binocular image data, so as to fulfill the aim of obstacle area detection.

The invention also discloses a binocular vision obstacle detection method based on the convolutional neural network, and FIG. 2 is a general flow chart of the obstacle detection method disclosed by the invention; FIG. 3 is a partial flow diagram of a design of a twin convolutional neural network to generate a disparity map; fig. 4 is a partial flow chart of the obstacle detection by the improved V-parallax method of the present invention, which specifically includes the following steps:

(1) acquiring an image pair of a scene to be detected by using a binocular camera, and then transmitting the acquired image to a computer for processing;

(2) the method adopts a window with the size of 3 × 3 to carry out median filtering on the image pair, effectively removes salt-pepper noise and speckle noise in the image, and retains the contour and the details of the image;

(3) the embodiment of the invention calibrates the binocular camera by using a Zhang Zhengyou calibration method, obtains the internal parameters and the distortion parameters, and corrects the binocular image pair to obtain the coplanar and line-aligned image pair;

(4) designing a twin convolutional neural network to generate a disparity map of an accurate binocular image, which comprises the following specific steps:

a) a twin convolutional neural network structure as shown in FIG. 5 is designed, and parameters of left and right branches are shared. The twin convolutional neural network is composed of a feature extraction sub-network (L1-L9) and a feature classification sub-network (L10-L11). The left branch and the right branch of the feature extraction sub-network can respectively extract corresponding feature descriptions from the input image square and the input image long strip; and the feature classification sub-network performs dot product operation on the extracted left and right branch feature descriptions to obtain the similarity score of the pixel points to be matched in the parallax search range, and then the similarity score is used as the input of the softmax layer to obtain the parallax probability distribution.

The convolutional kernel is designed to have the same receptive field as the traditional convolutional kernel, 25 parameters of the traditional convolutional kernel are reduced to 9 parameters, the calculated amount is reduced, and less parameter amount can also reduce overfitting.

b) And (6) cutting a sample. Selecting a pixel point p (x) with real parallax in the left image according to real parallax data in the KITTI data set_i,y_i) Extracting a 37 × 37 image block with the pixel point as the center;selecting a pixel point q in the right image, and enabling the coordinate to be (x)_i,y_i) And selecting 37 × 37 image block with q as center, selecting 37 × 237 image strip with the left side of the right boundary of the image block with q as center in the right image according to the parallax search range, and completing sample clipping by the method.

Wherein,

wherein,

C_CNN(p,d)＝-s(<S_L(p),S_R(q)>) (6)

wherein s: (<S_L(p),S_R(q)>) Representing the image pair similarity score. And finally, selecting a point with the minimum matching cost as a matching point for parallax selection in a parallax search range by a Winner-Take-All strategy (Winner-Take-All, WTA), and further generating a parallax map.

(5) The method comprises the following steps of processing a parallax image by an improved V parallax method so as to detect an obstacle area in the image:

i) Setting a threshold value T, and₂the gray value of the pixel greater than T is set to 1,setting the rest to 0 to generate a V parallax binary image I only containing road information₃The calculation formula of the threshold value T is as follows:

l) scanning the disparity map D (x, y, D) point by point from bottom to top from left to right according to a raster scanning method, and calculating f ═ kd + b for each pixel point, wherein D is a disparity value, and f is a vertical coordinate of the pixel point of the disparity map projected in the V disparity map.

The foregoing is only a preferred embodiment of the present invention. The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it is therefore intended that all such equivalent changes and modifications as would be obvious to one skilled in the art be included herein are deemed to be within the scope and spirit of the present invention as defined by the appended claims.

Claims

1. A detection method of a binocular vision barrier detection system based on a convolutional neural network is characterized in that the detection system based on the convolutional neural network consists of an image acquisition module and a barrier detection module which are connected; the image acquisition module is used for acquiring a left image and a right image in a scene; the obstacle detection module is used for carrying out corresponding data processing on the acquired image data to acquire a final accurate obstacle area; the image acquisition modules are two industrial cameras which are horizontally arranged in parallel and have the models of Pike F-100, and adopt IEEE-1394b interfaces to transmit image data to a computer through an acquisition card for subsequent processing; the obstacle detection module is a computer provided with a GPU of NVIDIAGTX 1070; the detection method is characterized by comprising the following steps:

(2) calibrating the image acquisition module, acquiring parameters of internal parameters and distortion of the image acquisition module, and correcting the binocular image acquired in the step (1);

(4) processing the disparity map acquired in the step (3) by using an improved V disparity method so as to detect an obstacle region in the image;

the specific method for designing the twin convolutional neural network to generate the disparity map of the accurate binocular image in the step (3) comprises the following steps of:

a) designing a twin convolutional neural network structure, wherein the left branch and the right branch of the twin convolutional neural network structure are shared by parameters, and the twin convolutional neural network is composed of a feature extraction sub-network and a feature classification sub-network; the left branch and the right branch of the feature extraction sub-network can respectively extract corresponding feature descriptions from an input image square and an input image long strip; the feature classification sub-network performs dot product operation on the extracted left and right branch feature descriptions to obtain similarity scores of pixel points to be matched in a parallax search range, and then the similarity scores are used as input of a softmax layer to obtain parallax probability distribution;

the feature extraction sub-networks are all composed of convolution layers, convolution kernels in the convolution layers adopt a mode of transmitting information at intervals, meanwhile, each layer uses a BatchNormalization technology and a PRELU activation function, and the last layer of convolution layer does not use the activation function;

b) cutting a sample; selecting real parallax from the left image according to the real parallax data in the KITTI data setPixel point p (x)_i,y_i) Extracting an image block with the pixel point as a center; selecting a pixel point q in the right image, and enabling the coordinate to be (x)_i,y_i) Selecting an image block by taking q as a center; according to the parallax search range, selecting a left image strip of the right boundary of an image square with q as the center in the right image, and finishing sample cutting by the method;

c) standardizing sample data and constructing a training set; transforming the grey value of the image block cut in the step b) into a range of [ -1,1], and taking the grey value as the input of the network;

d) training a twin convolutional neural network; training the twin convolutional neural network designed by the step a) by adopting a random gradient descent algorithm (Adam) of adaptive moment estimation;

e) calculating a disparity map; taking a binocular image pair, standardizing the binocular image pair by adopting the step c), extracting pixel point characteristic descriptions in the left image and the right image by using the network model trained in the step d), and respectively recording the pixel point characteristic descriptions as S_L(p) and S_R(q); will S_L(p) and S_R(q) performing dot product operation to obtain an image pair similarity score, and taking an inverse number as a matching cost:

C_CNN(p,d)＝-s(＜S_L(p),S_R(q)＞)

wherein S (< S)_L(p),S_R(q) >) represents an image pair similarity score; finally, in the parallax searching range, selecting a point with the minimum matching cost as a matching point for parallax selection, and further generating a parallax map;

the specific method for detecting an obstacle region in an image by processing the disparity map acquired in step (3) by using the improved V-disparity method in step (4) includes:

f) calculating the gradient of the parallax image in the column direction by adopting a Prewitt operator, reserving the parallax at the position where the gradient is a negative value, filtering out the pixel points of the obstacle, and generating a filtered parallax image;

g) counting the number of the same gray values of pixels in each line of the filtered disparity map generated in the step f) to generate a V disparity map I₁；

h) Search for V disparity map I₁The maximum value of each row in the image is reserved, and the gray scales of the rest pixel points are keptSetting the value to 0 to generate a maximum value V disparity map I₂；

in the formula, x_iIs I₂The value of each pixel; n is I₂The total number of middle pixels; 1{ x_iNot equal to 0} denotes x_iTaking 1 when not equal to 0, otherwise, taking 0;

j) extraction of I by hough line detection method₃A middle road straight line;

k) calculating the slope k and the intercept b of the image coordinate system of the road straight line extracted in the step j) in the V parallax map by using a straight line two-point equation;

l) scanning a disparity map D (x, y, D) point by point from bottom to top from left to right according to a raster scanning method, and calculating f ═ kd + b for each pixel point, wherein D is a disparity value, and f is a longitudinal axis coordinate of the disparity map pixel point projected on a V disparity map;

m) when (f-y) > T₁If the pixel point in the disparity map is projected above the straight line of the road surface in the V disparity map, namely higher than the road surface, the pixel point is judged to be an obstacle point;

n) repeating the steps l) and m) until the parallax map is completely scanned, and obtaining the obstacle area in the image.