CN103279759A

CN103279759A - Vehicle front trafficability analyzing method based on convolution nerve network

Info

Publication number: CN103279759A
Application number: CN2013102341260A
Authority: CN
Inventors: 李琳辉; 连静; 王蒙蒙; 丁新立; 宗云鹏; 化玉伟; 王宏旭; 常静
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2013-06-09
Filing date: 2013-06-09
Publication date: 2013-09-04
Anticipated expiration: 2033-06-09
Also published as: CN103279759B

Abstract

The invention discloses a vehicle front trafficability analyzing method based on a convolution nerve network. The method comprises the following steps: first, a vidicon arranged in the front of a vehicle is used for collecting a large number of actual vehicle traveling environment images; a Gamma rectification function is used for pre-processing the images; training of the convolution nerve network is conducted. According to the method, a nonlinear function superimposed Gamma rectification method is used for pre-processing the images, so that influence of light illumination of strong changes on identification of objects is avoided, and the image resolution ratio is improved. According to the method, a geometry normalization method is used, so that the resolution ratio difference caused by identifying the distance of an object distance vidicon is reduced. The convolution nerve network LeNet-5 adopted in the method can extract implicit expression characteristics with class distinguishing capacity and is simple in extracting process. The LeNet-5 is combined with a local receptive field, weight share and secondary sampling to ensure robustness of simple geometry deformation, reduce training parameters of the network, and simplify the structure of the network.

Description

A kind of vehicle front trafficability analytical approach based on convolutional neural networks

Technical field

The invention belongs to safe driver assistance and intelligent transport technology field, relate to vehicle front trafficability analytical approach, specially refer to a kind of by camera acquisition vehicle front video image, based on the vehicle front trafficability analytical approach of convolutional neural networks.

Background technology

The analysis of vehicle front trafficability belongs to the environment sensing category of intelligent transportation field outside car, refer to based on advanced meanses such as sensor technology, computer technology or mechanicss of communication the driving safety of institute's acquisition environment be analyzed, find out the potential safety hazard of existence, send prompting and early warning or lay the foundation for the automatic driving vehicle navigation to the driver.At present, based on camera acquisition vehicle front video image information, the research of adopting the visual pattern understanding method to carry out the trafficability analysis mainly contains detection of obstacles, pedestrian detection, vehicle detection, Road Detection, traffic sign detection, topography and geomorphology classification etc.

Trafficability is analyzed related visual pattern understanding method and can be divided into based on the method for rebuilding with based on the method for identifying.Wherein, be based on three-dimensional or 2.5 dimension reconstruction techniques based on the method for rebuilding, make judgement from the angle in space to whether passing through, problems such as the less and real-time difference of serious polysemy, the reconstruction scope that is difficult to avoid three-dimensional reconstruction intrinsic.In the image understanding method based on identification, mainly contain algorithm based on modeling and template matches, general neural network, support vector machine, self-supervisory study, based on the method for statistical learning etc., these methods need be extracted the explicit features of target, the leaching process complexity, easily cause important information to lose, adaptive capacity to environment is poor.

For the strong structured road environment of illumination variation, if directly original image is identified, interfere information is many, the process complexity that explicit features is extracted, furthermore target can cause the difference of resolution apart from difference apart from video camera.In addition, the variation of illumination can influence picture quality, reduces the resolution of image.

Summary of the invention

For solving the problems referred to above that prior art exists, the present invention will propose a kind of vehicle front trafficability analytical approach based on convolutional neural networks, this method can be extracted the implicit features of target, leaching process is simple, avoid reducing the resolution of image, and can reduce the influence of illumination, be applicable to the structured road environment that illumination variation is strong.

Technical scheme of the present invention is: a kind of vehicle front trafficability analytical approach based on convolutional neural networks may further comprise the steps:

A, image acquisition

At first by a large amount of real vehicle running environment images of the camera acquisition that is installed in vehicle front, described image has m * n pixel; Obtain the zone of image bottom 3/5ths as area-of-interest by cutting then; At last images cut is converted into gray level image.

B, image pre-service

B1, the Gamma of method construct that utilizes nonlinear function to superpose correct function, and the gray level image that steps A obtains is corrected, and concrete functional expression is as follows:

G(x)＝1+f ₁(x)+f ₂(x)+f ₃(x) （1）

f ₁(x)＝acos(πx/255) （2）

f ₂(x)＝(K(x)+b)cosβ+xsinα （3）

K(x)＝ρsin(4πx/255) （4）

α＝arctan(-2b/255) （5）

f ₃(x)＝R(x)cos(3πx/255) （6）

R(x)＝c|2x/255-1| （7）

In the formula, x is a certain gray values of pixel points, the Gamma correction value of a certain gray-scale value correspondence of G (x) representative, and a ∈ (0,1) is a weighting coefficient, b represents f ₂(x) maximum changing range, ρ are represented the amplitude of K (x), and α represents the deflection angle of K (x), and c represents the amplitude of R (x), and satisfy a+b+c＜1.

Gray-scale value computing formula after Gamma corrects is:

g(x)＝255(x/255) ^1/G(x) （8）

In the formula, a certain gray values of pixel points after g (x) representative is corrected through Gamma.

Correct through Gamma, obtain gray level image P.

B2, at gray level image P, change some gray values of pixel points, concrete change method is as follows:

Choose in the image gray-scale value in the image-region except vehicle and road boundary and be 0 pixel, change its gray-scale value into 1, choose in the image gray-scale value in the image-region except vehicle and road boundary and be 255 pixel, change its gray-scale value into 254; Change vehicle region gray values of pixel points in the image into 0, the gray-scale value of road boundary area pixel point changes 255 into, and the image behind the change pixel is gray level image Q.So far, the pixel of gray level image Q comprises three classes: the first kind is that gray-scale value is 0 pixel, represents vehicle; Second class is that gray-scale value is 255 pixel, represents road boundary; The 3rd class is that to remove gray-scale value be pixel outside 0 and 255, represents the road surface.Give corresponding label respectively with above three class pixels, be about to label " 0 " and compose expression " vehicle " to first kind pixel, label " 1 " is composed to the second class pixel, expression " road boundary " composes label " 2 " to the 3rd class pixel, expression " road surface ".At last the label of each pixel among the gray level image Q is composed to corresponding pixel among the gray level image P.

B3, carry out normalized at the size of gray level image P:

B31, along the picture altitude direction, at interval choose different pixel columns, represent with x, measure by actual samples, obtain the pixel wide of the corresponding target of different pixels row x institute and highly;

B32, being that 0～32 image-region is the reference picture zone with pixels tall in the image, is benchmark with pixel wide W and the height H of the target that will identify in this reference picture zone, and the horizontal and vertical cutting scale-up factor of namely establishing this reference picture zone is 1; With W and H remove respectively target on all the other each pixel columns width and the height, obtain two groups of ratios, represent with Y and Z respectively;

B33, at last pixel column x and two groups of ratio Y and Z are carried out match respectively, obtain two matched curves, as follows:

Y＝k ₁x+b ₁ （9）

Z＝k ₂x+b ₂ （10）

Wherein, the horizontal cutting scale-up factor of Y representative image, the vertical cutting scale-up factor of Z representative image, a certain pixel column of x representative image, k ₁, k ₂Represent the slope of two matched curves respectively, b ₁, b ₂Represent the intercept of two matched curves respectively.

B34, the horizontal and vertical cutting ratio in reference picture zone all is made as 1, is about to the image pattern that the reference picture zone is cut to 32 * 32 pixels.Along with the increase of x, according to the also corresponding increase of horizontal and vertical cutting size of formula (9), (10) acquisition.By cutting, obtain a series of image patterns not of uniform size, the image pattern unification that at last cutting is obtained is scaled the image of 32 * 32 pixels.With the image of 32 * 32 pixels that the obtain training sample as convolutional neural networks.

The training of C, convolutional neural networks

Typical case's convolutional neural networks LeNet-5 forms by 8 layers, and input is that layer is 32 * 32 pixel images; Network layer C1, C3 and C5 represent convolutional layer respectively, and network layer S2 and S4 are time sampling layer, and network layer F5 is full articulamentum, and the neuronic number of output layer is identical with the target classification number that will identify, changes according to actual application environment.A face of every layer network represents a characteristic pattern, the set that the neuron that this characteristic pattern is shared by weights in each layer is formed.The neuron of each layer only is connected with the neuron of a local receptive field of last layer.

The general type of convolutional layer is:

x_{j}^{l} = f (\underset{i &Element; M_{j}}{Σ} x_{i}^{l - 1} * k_{ij}^{l} + b_{j}^{l}) - - - (11)

In the formula, { 1,2,3,4,5,6,7,8} represents the number of plies to l ∈, and k is convolution kernel, M _jRepresent the selection of input feature vector figure, b represents biasing.

The general type of inferior sampling layer is:

x_{j}^{l} = f (β_{j}^{l} down (x_{j}^{l - 1}) + b_{j}^{l}) - - - (12)

In the formula, down () expression time sampling function generally is the zone summation to a n * n of last tomographic image, and β represents time weights of sampling layer, and b represents biasing.

According to actual application environment, the number of the output neuron of LeNet-5 to be adjusted, the Pixel Dimensions that adopts step B to obtain then is that 32 * 32 image pattern is trained.By training, when the output valve of convolutional neural networks and the error of expectation value are in the tolerance interval, just obtain can be used for the convolutional neural networks of vehicle front trafficability analysis.

Compared with prior art, effect of the present invention and benefit are:

1, the present invention adopts the Gamma antidote pretreatment image of nonlinear function stack, has avoided the illumination of strong variations to the influence of target identification, has improved image resolution ratio.

2, the present invention has adopted the geometrical normalization method, has reduced the differences in resolution that identification target range video camera distance causes.

3, the convolutional neural networks LeNet-5 of the present invention's employing can extract the implicit features with classification resolution characteristic, and leaching process is simple; LeNet-5 shares and inferior sampling in conjunction with local receptive field, weights, guarantees the robustness to the simple geometry distortion, and has reduced the training parameter of network, has simplified network structure; The neuron number of LeNet-5 output layer can be adjusted according to actual application environment, and adaptive capacity to environment is strong.

Description of drawings

3 in the total accompanying drawing of the present invention, wherein:

Fig. 1 is based on the vehicle front trafficability analytical approach process flow diagram of convolutional neural networks

Fig. 2 convolutional neural networks LeNet-5 structural drawing.

The sample set of Fig. 3 convolutional neural networks training.

Embodiment

The present invention is further described below in conjunction with accompanying drawing.Be illustrated in figure 1 as the process flow diagram based on the vehicle front trafficability analytical approach of convolutional neural networks.The present invention is example with the structured environment of highway, and environment before the car is divided into vehicle, road boundary and road surface.

Analytic process of the present invention comprises: the training of image acquisition, image pre-service, convolutional neural networks.

A, image acquisition

Camera acquisition by being installed in vehicle front a large amount of real highway running environment images (640 * 480 pixel), then with 3/5ths parts under the image as area-of-interest (640 * 288 pixel), to reduce follow-up workload; At last images cut is converted into gray level image.

B, image pre-service

The first step, the Gamma antidote has certain advantage aspect the minimizing illumination effect.Generally speaking, when the Gamma value greater than 1 the time, the high light part of image is compressed and shadow partly is expanded; When the Gamma value less than 1 the time, the high light of image partly is expanded and the shadow part is compressed.Utilize Gamma of method construct of nonlinear function stack to correct function, the gray level image that steps A obtains is corrected, functional expression is as follows:

G(x)＝1+f ₁(x)+f ₂(x)+f ₃(x) （1）

f ₁(x)＝acos(πx/255) （2）

f ₂(x)＝(K(x)+b)cosβ+xsinα （3）

K(x)＝ρsin(4πx/255) （4）

α＝arctan(-2b/255) （5）

f ₃(x)＝R(x)cos(3πx/255) （6）

R(x)＝c|2x/255-1| （7）

In the formula, x is a certain gray values of pixel points, the Gamma correction value of a certain gray-scale value correspondence of G (x) representative, and a ∈ (0,1) is a weighting coefficient, b represents f ₂(x) maximum changing range, ρ are represented the amplitude of K (x), and α represents the deflection angle of K (x), and c represents the amplitude of R (x), and satisfy a+b+c＜1.Generally speaking, a＜b≤c, desirable a=0.2, b=0.3, c=0.3.

Gray-scale value computing formula after Gamma corrects is:

g(x)＝255(x/255) ^1/G(x) （8）

Correct through Gamma, obtain gray level image P.

Second step, at the gray level image P that corrects through Gamma, choose the gray-scale value except vehicle and road boundary in the image and be 0 pixel, change these gray values of pixel points into 1 by programming, choose the gray-scale value except vehicle and road boundary in the image and be 255 pixel, change these gray values of pixel points into 254 by programming; Change vehicle region gray values of pixel points in the image into 0 by programming then, the gray-scale value of road boundary area pixel point changes 255 into, obtains gray level image Q.So far, the pixel of gray level image Q comprises three classes: the first kind is that gray-scale value is 0 pixel, represents vehicle; Second class is that gray-scale value is 255 pixel, represents road boundary; The 3rd class is that to remove gray-scale value be pixel outside 0 and 255, represents the road surface.Give corresponding label respectively by being programmed for above three class pixels, be about to label " 0 " and compose expression " vehicle " to first kind pixel, label " 1 " is composed to the second class pixel, expression " road boundary " composes label " 2 " to the 3rd class pixel, expression " road surface ".At last the label of each pixel among the image Q is composed the respective pixel point of giving among the gray level image P by programming.

In the 3rd step, carry out normalized at the size of gray level image P:

In image, the shared number of pixels of same target is subjected to himself very big apart from the influence of video camera distance, and namely the shared number of pixels of arbitrary target is inversely proportional to the distance of itself and video camera.The present invention proposes a kind of new geometrical normalization method, carries out normalized at the size of gray level image, avoids reducing the resolution of image, reduces the differences in resolution that identification target range video camera distance causes.At first, on picture altitude, choose different pixel columns at interval, represent with vector x, x={15 32 60 65 68 72 75 82 85 87 92 96 100 108 111 113 124 130 138 143 150 160}, measure by actual samples, obtain the pixel wide of the corresponding target of different pixels row institute and highly; Secondly, be that 0～32 image-region is reference with pixels tall in the image, be benchmark with pixel wide W and the height H of target in this reference picture zone, namely the horizontal and vertical cutting scale-up factor in this reference picture zone is 1; At last, remove width and the height of target on all the other each pixel columns respectively with W and H, obtain two groups of ratios, represent with Y and Z respectively, by programming pixel column x data and Y and Z data are carried out match respectively then, obtain two matched curves, as follows:

Y＝0.0312x-0.8339 （9）

Z＝0.0360x-1.0590 （10）

Wherein, the horizontal cutting scale-up factor of Y representative image, the vertical cutting scale-up factor of Z representative image, a certain pixel column of x representative image.

The horizontal and vertical cutting ratio in reference picture zone is 1, is about to the image pattern that the reference picture zone is cut to 32 * 32 pixels.Along with the increase of x, according to the also corresponding increase of horizontal and vertical cutting size of formula (9), (10) acquisition.By cutting, obtain a series of image patterns not of uniform size, the image pattern that cutting is obtained is by the unified image that is scaled 32 * 32 pixels of programming at last.With the image of 32 * 32 pixels that the obtain training sample as convolutional neural networks.

The training of C, convolutional neural networks

The structure that the present invention adopts typical convolutional neural networks LeNet-5 as shown in Figure 2.Typical case's convolutional neural networks LeNet-5 forms by 8 layers, and input picture is 32 * 32 pixels; Network layer C1, C3, C5 represent convolutional layer, and network layer S2, S4 are time sampling layer, and network layer F5 is full articulamentum, and the neuronic number of output layer is identical with the target classification number that will identify, can change according to actual application environment.A face of every layer network represents a characteristic pattern, the set that the neuron that this characteristic pattern is shared by weights in each layer is formed.The neuron of each layer only is connected with the neuron of a local receptive field of last layer (counting from input layer).

Convolutional layer C1 is that 28 * 28 characteristic pattern is formed by 6 sizes, and each neuron of characteristic pattern is connected with one 5 * 5 neighborhood of input picture, can train and is connected with 122304 but convolutional layer C1 comprises 156 training parameters.Inferior sampling layer S2 is that 14 * 14 characteristic pattern is formed by 6 sizes, and size is that 2 * 2 neighborhood links to each other among each neuron of characteristic pattern and the convolutional layer C1, but an inferior sampling layer S2 has 12 training parameters can train and be connected with 5880.Convolutional layer C3 is that 10 * 10 characteristic pattern is formed by 16 sizes, and each neuron of characteristic pattern links to each other with time one 5 * 5 the neighborhood of sampling layer S2, can train and is connected with 151600 but convolutional layer C3 includes 1516 training parameters.Inferior sampling layer S4 is that 5 * 5 characteristic pattern is formed by 16 sizes, and each neuron of characteristic pattern and the size of convolutional layer C3 are that 2 * 2 neighborhood is connected, and can train and is connected with 2000 but an inferior sampling layer S4 includes 32 training parameters.Convolutional layer C5 is made up of 120 characteristic patterns, and each neuron of characteristic pattern is connected with time neighborhood of 5 * 5 sizes of all characteristic patterns of sampling layer S4, can train and is connected with 48120 but convolutional layer C5 includes 48120 training parameters.Network layer F6 is connected entirely with convolutional layer C5, but full articulamentum F6 comprises 10164 training parameters.Output layer is made up of the radial basis function unit.

The general type of convolutional layer is:

x_{j}^{l} = f (\underset{i &Element; M_{j}}{Σ} x_{i}^{l - 1} * k_{ij}^{l} + b_{j}^{l}) - - - (11)

In the formula, l represents the number of plies, and k is convolution kernel, M _jRepresent the selection of input feature vector figure, b represents biasing.

The general type of inferior sampling layer is:

x_{j}^{l} = f (β_{j}^{l} down (x_{j}^{l - 1}) + b_{j}^{l}) - - - (12)

The present invention is example with the highway environment, environment is divided into road surface, vehicle and road boundary three parts before the car, therefore the output neuron number of LeNe-5 should be decided to be 3, and output 0 representative identification target is vehicle, output 1 representative identification target is road boundary, and output 2 representative identification targets are the road surface.The sample set size of network training is 5000, and the part sample as shown in Figure 3.Choosing of network initial weight adopts random approach to produce.With this known classification mode the convolution network is trained, network has just possessed the mapping ability between the inputoutput pair.By training, if satisfy the output valve of convolutional neural networks and the error of expectation value is in the tolerance interval, just obtained can be used for the convolutional neural networks of vehicle front trafficability analysis.

Claims

1. vehicle front trafficability analytical approach based on convolutional neural networks is characterized in that: may further comprise the steps:

A, image acquisition

At first by a large amount of real vehicle running environment images of the camera acquisition that is installed in vehicle front, described image has m * n pixel; Obtain the zone of image bottom 3/5ths as area-of-interest by cutting then; At last images cut is converted into gray level image;

B, image pre-service

G(x)＝1+f ₁(x)+f ₂(x)+f ₃(x) （1）

f ₁(x)＝acos(πx/255) （2）

f ₂(x)＝(K(x)+b)cosβ+xsinα （3）

K(x)＝ρsin(4πx/255) （4）

α＝arctan(-2b/255) （5）

f ₃(x)＝R(x)cos(3πx/255) （6）

R(x)＝c|2x/255-1| （7）

In the formula, x is a certain gray values of pixel points, the Gamma correction value of a certain gray-scale value correspondence of G (x) representative, and a ∈ (0,1) is a weighting coefficient, b represents f ₂(x) maximum changing range, ρ are represented the amplitude of K (x), and α represents the deflection angle of K (x), and c represents the amplitude of R (x), and satisfy a+b+c＜1;

Gray-scale value computing formula after Gamma corrects is:

g(x)＝255(x/255) ^1/G(x) （8）

In the formula, a certain gray values of pixel points after g (x) representative is corrected through Gamma;

Correct through Gamma, obtain gray level image P;

B2, B2, at gray level image P, change some gray values of pixel points, concrete change method is as follows:

Choose in the image gray-scale value in the image-region except vehicle and road boundary and be 0 pixel, change its gray-scale value into 1, choose in the image gray-scale value in the image-region except vehicle and road boundary and be 255 pixel, change its gray-scale value into 254; Change vehicle region gray values of pixel points in the image into 0, the gray-scale value of road boundary area pixel point changes 255 into, and the image behind the change pixel is gray level image Q; So far, the pixel of gray level image Q comprises three classes: the first kind is that gray-scale value is 0 pixel, represents vehicle; Second class is that gray-scale value is 255 pixel, represents road boundary; The 3rd class is that to remove gray-scale value be pixel outside 0 and 255, represents the road surface; Give corresponding label respectively with above three class pixels, be about to label " 0 " and compose expression " vehicle " to first kind pixel, label " 1 " is composed to the second class pixel, expression " road boundary " composes label " 2 " to the 3rd class pixel, expression " road surface "; At last the label of each pixel among the gray level image Q is composed to corresponding pixel among the gray level image P;

B3, carry out normalized at the size of gray level image P:

Y＝k ₁x+b ₁ （9）

Z＝k ₂x+b ₂ （10）

Wherein, the horizontal cutting scale-up factor of Y representative image, the vertical cutting scale-up factor of Z representative image, a certain pixel column of x representative image, k ₁, k ₂Represent the slope of two matched curves respectively, b ₁, b ₂Represent the intercept of two matched curves respectively;

B34, the horizontal and vertical cutting ratio in reference picture zone all is made as 1, is about to the image pattern that the reference picture zone is cut to 32 * 32 pixels; Along with the increase of x, according to the also corresponding increase of horizontal and vertical cutting size of formula (9), (10) acquisition; By cutting, obtain a series of image patterns not of uniform size, the image pattern unification that at last cutting is obtained is scaled the image of 32 * 32 pixels; With the image of 32 * 32 pixels that the obtain training sample as convolutional neural networks;

The training of C, convolutional neural networks

Typical case's convolutional neural networks LeNet-5 forms by 8 layers, and input is that layer is 32 * 32 pixel images; Network layer C1, C3 and C5 represent convolutional layer respectively, and network layer S2 and S4 are time sampling layer, and network layer F5 is full articulamentum, and the neuronic number of output layer is identical with the target classification number that will identify, changes according to actual application environment; A face of every layer network represents a characteristic pattern, the set that the neuron that this characteristic pattern is shared by weights in each layer is formed; The neuron of each layer only is connected with the neuron of a local receptive field of last layer;

The general type of convolutional layer is:

x_{j}^{l} = f (\underset{i &Element; M_{j}}{Σ} x_{i}^{l - 1} * k_{ij}^{l} + b_{j}^{l}) - - - (11)

In the formula, { 1,2,3,4,5,6,7,8} represents the number of plies to l ∈, and k is convolution kernel, M _jRepresent the selection of input feature vector figure, b represents biasing;

The general type of inferior sampling layer is:

x_{j}^{l} = f (β_{j}^{l} down (x_{j}^{l - 1}) + b_{j}^{l}) - - - (12)

In the formula, down () expression time sampling function generally is the zone summation to a n * n of last tomographic image, and β represents time weights of sampling layer, and b represents biasing;

According to actual application environment, the number of the output neuron of LeNet-5 to be adjusted, the Pixel Dimensions that adopts step B to obtain then is that 32 * 32 image pattern is trained; By training, when the output valve of convolutional neural networks and the error of expectation value are in the tolerance interval, just obtain can be used for the convolutional neural networks of vehicle front trafficability analysis.