CN110008848A

CN110008848A - A kind of travelable area recognizing method of the road based on binocular stereo vision

Info

Publication number: CN110008848A
Application number: CN201910187216.6A
Authority: CN
Inventors: 李巍华; 王航; 方卓琳
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2019-07-12

Abstract

The invention discloses a kind of roads based on binocular stereo vision can travel area recognizing method, comprising steps of the authentic data collection generally acknowledged to automatic Pilot field is acquired by (1) left and right monocular cam is input to pretreated image acquires spatial depth figure, the spatial depth figure is synthesized to the RGB-D image of binocular stereo vision, composing training, test and verification data set with original left camera RGB image；(2) training set is input to and is trained final semantic segmentation model based on improved Alexnet shallow-layer network, then test data is inputted into parted pattern, thus image of the output comprising can travel region；(3) start and completed to install on intelligent vehicle, demarcated complete binocular stereo vision camera group acquirement binocular stereo vision image, being entered into trained model realization road can travel the identification in region.The present invention provides more accurate, comprehensive perception data source for Vehicular intelligent driving.

Description

A kind of travelable area recognizing method of the road based on binocular stereo vision

Technical field

The advanced auxiliary of automobile of the present invention drives and automatic Pilot field, and in particular to a kind of road based on binocular stereo vision Road can travel area recognizing method.

Background technique

The fast development of auto industry, computer industry and internet is just promoting entire automobile industry to occur one jointly Secondary innovation revolution.It is loaded with the vehicle of advanced DAS (Driver Assistant System), various technology senses full new automobile and automatic Pilot technology Development, all teaching that distance " not needing driver " it is following within sight.But from the angle of technology, we are not Hardly possible discovery will realize that the intelligent driving of vehicle, the intelligent driving of especially LV3 and the above rank still have many problems demand solutions Certainly, such as sensing layer data extrapolating problem, decision-making level's logical problem and control layer actual effect problem.Wherein, accurate judgement vehicle Travelable region should be addressed first, to travelable region have accurate anticipation and to vehicle carry out rationally, effective control System, could further advance to vehicle and make right instructions, could preferably guarantee driving safety, the hair to avoid traffic accident It is raw.

For intelligent driving vehicle, environment sensing is the basis that vehicle realizes decision and control, due to intelligent vehicle The road traffic environment faced is extremely complex changeable, therefore, how to realize " the perfection perception " to road environment as far as possible, quasi- True identifies that road can travel region and just become the field for having important research to be worth.

The common sensor of intelligent driving includes visual sensor, ultrasonic sensor, laser radar, GPS and IMU at present Deng wherein visual sensor obtains the favor of numerous researchers with its cheap cost and preferable reliability, is current intelligence One of most widely used sensor in driving.Traditional travelable area recognizing method is carried out based on monocular image mostly , without considering space geometry feature, and the timeliness of network model processing is poor, thus this patent is by binocular tri-dimensional Feel technology combination shallow semantic divides network, and being introduced into road can travel in region recognition, and method focuses more on travelable region Geometric depath information.

Semantic segmentation is a problem of field of machine vision all the time, is also difficult to find a kind of generalization and Shandong so far The good parted pattern of stick.But its importance is self-evident, we weigh image, semantic segmentation from following standard Advantage and disadvantage:

(1) inside of different cut zone is smooth, and feature is consistent, and meets similitude and principle of whole；

(2) it is obvious to divide relied on property difference for different picture regions；

(3) boundary gradient in adjacent segmentation region changes greatly, so that boundary is obvious.Although many computer visions The expert in field proposes many solutions, so that semantic segmentation has pretty good development, but since it has complexity Principle, engineering feature, so there are still many problems to need to solve.

Convolutional neural networks are a kind of neural networks, but are that convolutional neural networks have day with the difference of general neural network The advantage of feature is so extracted, convolutional neural networks contain the feature extractor being made of convolutional layer and sub-sampling layer, Neng Gougeng Increase effect, intelligently extract characteristics of image, this point is that the tional identification algorithm based on pixel and boundary cannot compare.It Before, it is higher that the neural network due to comparing deep layer (referring generally to three layers or more) calculates force request to hardware, but now with depth The rapid development of learning algorithm and hardware is spent, problems have been obtained overcoming well, also provided for the Project Realization of algorithm It may.

Summary of the invention

To solve the above problems existing in the prior art, the present invention to design a kind of road based on binocular stereo vision can Running region recognition methods, it is intended to which judgement road that can be more acurrate, more reliable can travel region, preferably perception vehicle driving Environment, to preferably guide the traveling of vehicle in intelligently auxiliary drives.

In order to achieve the above object, technical solution of the present invention and method are as described below:

A kind of travelable area recognizing method of the road based on binocular stereo vision, comprising steps of

(1) by the authentic data collection that automatic Pilot field is generally acknowledged be acquired with pretreated image be input to it is left, Right monocular cam simultaneously acquires spatial depth figure by binocular stereo vision, by the spatial depth figure and original left camera RGB image synthesizes the RGB-D image of binocular stereo vision, composing training, test and verification data set；

(2) training set is input to and semantic segmentation is carried out based on improved Alexnet shallow-layer network and is constantly changed Generation training, Optimal Parameters, reach scheduled model convergence state, and final curing model parameter completes training as final Semantic segmentation model, then test data is inputted into parted pattern, thus image of the output comprising can travel region；

(3) start and completed to install on intelligent vehicle, demarcated complete binocular stereo vision camera group, to traffic environment Information carries out test image acquisition, obtains binocular stereo vision image, it is feasible to be entered into trained model realization road Sail the identification in region.

Further, the step (1) specifically includes:

Stereoscopic vision matching is carried out by left and right monocular image and acquires disparity map, later according to system binocular vision model Spatial depth figure is acquired, is finally superimposed upon spatial depth figure (representing space geometry information and feature) as fourth lane On RGB image, RGB-D image is formed.Increase space geometry feature, i.e. depth information i.e. on RGB image feature, so that net Network has more features " can remember ".

Left and right monocular image obtains the depth information of image pixel by technique of binocular stereoscopic vision, restores to a certain degree The three-dimensional geometry feature of traffic route, and the advantage of visual sensor quickly, stable, inexpensive can be given full play to.

Further, the encoder in the step (2), by construction convolutional neural networks as feature extraction, wherein The improvement of Alexnet shallow-layer network is specifically included that

(1) full articulamentum is removed, the convolutional layer and pond layer of network are only retained；

(2) all pond layers of original Alexnet are changed to 2x2, get rid of all pond 3x3 layers；

(3) size of first layer convolutional layer is changed to 3x3 by 11x11, its step-length is changed to 1, and make padding=1；

(4) identical convolutional layer is added with second layer convolutional layer between second layer convolutional layer and second layer pond layer, increases The depth of screening network, so that the convolution number of plies of new network is 6, and the pond number of plies then remains unchanged.

Further, in step (2), the training process of the improved Alexnet shallow-layer network includes:

(1) the spatial depth figure and original left camera RGB image are synthesized into RGB-D image, is arranged as instruction Practice data set；

(2) with the training dataset for having label, training data is input to designed semantic segmentation network, picture By several convolutional layers and pond layer, feature extraction is carried out；

(3) pond layer is up-sampled using deconvolution, search up-sampling is restored by whole image by deconvolution, It is mapped to picture pixels rank, probabilistic forecasting is carried out to the travelable area pixel of intelligent vehicle；

(4) realize that semantic segmentation indicates and exports the image in travelable region.

Further, the authentic data collection generally acknowledged to automatic Pilot field is acquired and pretreatment specifically includes:

Training image and evaluation image are obtained with open source KITTI and CITYSCAPES data set；

Data enhancing processing is carried out to the training image and evaluation image, is grasped by symmetrical, overturning, translation and rotation Make, obtains the training dataset that quantity is bigger, covering surface is more complete.

Further, the semantic segmentation model of the improved Alexnet shallow-layer network includes: 12 convolutional layers, 3 ponds Change layer, 3 warp laminations (up-sampling) and one layer Softmax layers are used to prediction pixel probability.

Further, it is specifically to utilize stochastic gradient that the travelable area pixel to intelligent vehicle, which carries out probabilistic forecasting, The method for declining (mini-batch SGD) trains the Softmax of output layer to return layer, until the loss function returned is realized in advance Fixed convergence reaches scheduled prediction segmentation recognition effect.

Further, the travelable region specifically refers to all roads for removing all barriers, allowing vehicle driving Road region.

Further, it is described allow vehicle driving all road areas from physical structure level include structuring road surface, Semi-structured road surface and unstructured road surface.

Further, its road edge of the road surface of structuring rule, the structure on road surface is single, neat, has obvious Lane line and other handmarkings carriage way, structure sheaf executes according to certain standard, the color and material of surface layer Keep unified；The semi-structured road surface refers to the road surface of general nonstandardized technique, and the color and material of top course exist Difference, such as parking lot, square etc. further include some other distributor roads；The non-structured road surface refers to no structure The natural road scene of layer.

Compared with prior art, the invention proposes under a kind of intelligent driving running car environment, vehicle is for can travel area The recognition methods in domain, this method are proposed for the key and necessity for identifying road in intelligent vehicle traveling based on binocular The road of stereoscopic vision can travel the recognition methods in region: it is special to increase space geometry on the basis of monocular image R, G, B feature Sign (depth) recycles convolutional neural networks to carry out image special to realize the application that binocular stereo vision is divided in image, semantic Sign is extracted, and the feature extracted is mapped directly to whole picture by deconvolution neural network, realizes the classification of pixel one by one； To based on improved Alexnet convolutional neural networks and deconvolution neural network composed by semantic segmentation model carry out it is continuous Improvement and optimization, finally by above-mentioned semantic segmentation model be applied to vehicle can travel region identification in, and by generally acknowledge Standard data set carries out recognition effect verifying, compares segmentation precision, advanced optimizes model, provides more for Vehicular intelligent driving For accurate, comprehensive perception data source.

Detailed description of the invention

The present invention provides reference attached drawings in order to further understand to disclosure, and attached drawing constitutes the one of the application Part, but it is only used for some invention non-limiting examples that diagram embodies concept of the invention, rather than for making any limit System.

Fig. 1 is that the road based on binocular stereo vision of some example embodiments according to the present invention can travel the knowledge in region The flow chart of other method.

Fig. 2 is the convolution sum up-sampling network structure of some example embodiments according to the present invention.

Fig. 3 is binocular stereo vision schematic diagram, is illustrated according to the principle that left and right monocular vision seeks geometric space depth map Figure.

Fig. 4 is that obtained black region can travel area schematic according to the method for the present invention.

Specific embodiment

It elaborates with reference to the accompanying drawings and examples to the present invention.

As shown in Figure 1, a kind of road based on binocular stereo vision can travel area recognizing method, comprising steps of

Specifically, the step (1) specifically includes:

Wherein, as shown in figure 3, the principle of binocular stereo vision principle derivation space geometry depth map is as follows:

Wherein, f is camera focus, and b is binocular vision system parallax range, and z is Object Depth, that is, distance, u_RIt is that object exists Right camera pixel, u_LIt is object in left camera pixel.

It arranges:

D=u_L-u_R

What d here was indicated is the difference of left and right pixel coordinate, and unit is pixel (pixel), is defined as parallax (Disparity)。

As shown in Fig. 2, the present embodiment uses convolution, pond layer, up-samples this deep learning processing method, pass through improvement Alexnet building shallow-layer semantic segmentation network, preferably the travelable region of road scene can be split, preferably The identification in the travelable region of specific region under the conditions of ground realization intelligent driving, and there is relatively high accuracy of identification and efficiency, Relatively good data are capable of providing to intelligent driving algorithm to support.

Specifically, the encoder in the step (2), by construction convolutional neural networks as feature extraction, wherein The improvement of Alexnet shallow-layer network is specifically included that

Specifically, in step (2), the training process of the improved Alexnet shallow-layer network includes:

Specifically, the authentic data collection generally acknowledged to automatic Pilot field is acquired and pretreatment specifically includes:

Data enhancing processing is carried out to the training image and evaluation image, is grasped by symmetrical, overturning, translation and rotation Make, obtains the training dataset that quantity is bigger, covering surface is more complete, can also prevent the generation of over-fitting.

Specifically, the semantic segmentation model of the improved Alexnet shallow-layer network includes: 12 convolutional layers, 3 ponds Change layer, 3 warp laminations (up-sampling) and one layer Softmax layers are used to prediction pixel probability.

Generally, the classification for carrying out Pixel-level to image by convolution sum deconvolution, to solve the figure of semantic level As segmentation problem.Initially classical CNN is after convolutional layer, the feature vector that usually can all full articulamentum is used to be defined, Softmax classification is carried out again.And improved Alexnet has better feature extraction effect, and using warp lamination to selected The characteristic pattern (feature map) for selecting convolutional layer is up-sampled, and so that it is restored to the identical size of input picture, or even will not Up-sampling result with characteristic pattern is merged, and on the basis of retaining image space information, realizes the prediction to each pixel, Finally classified pixel-by-pixel on the characteristic pattern of up-sampling.Last individual element calculates the loss (Loss) of Softmax classification, That is, each pixel corresponds to a training sample.

Preferably, the inherent advantage of convolutional neural networks (CNN) is that it is multiple that its multilayered structure may learn image The feature of level, i.e. the convolutional layer receptive field of shallow-layer compare smaller, and the main local feature for learning low-dimensional, the present invention passes through expansion The width of convolution kernel number increase network；The convolutional layer receptive field of deep layer is more larger, main study to higher-dimension, more abstract Feature.Its weight parameter of different features is different, different to the sensibility of feature, helps to improve network performance.

Specifically, it is specifically to utilize stochastic gradient that the travelable area pixel to intelligent vehicle, which carries out probabilistic forecasting, The method for declining (mini-batch SGD) trains the Softmax of output layer to return layer, until the loss function returned is realized in advance Fixed convergence reaches scheduled prediction segmentation recognition effect.

Specifically, the travelable region specifically refer to remove all barriers (such as: vehicle, pedestrian, isolation strip Deng), allow vehicle driving all road areas.All road areas for allowing vehicle driving are from physical structure level packet Include structuring road surface, semi-structured road surface and unstructured road surface.The road surface of the structuring its road edge rule, road The structure in face is single, neat, has the carriage way of apparent lane line and other handmarkings, structure sheaf is according to certain mark Standard executes, and the color and material of surface layer keep unified；The semi-structured road surface refers to the road surface of general nonstandardized technique, road The color and material of face surface layer have differences, such as parking lot, square etc., further include some other distributor roads；It is described non- The road surface of structuring refers to the natural road scene of not structure sheaf.In structuring, semi-structured road and unstructured road row It sails under scene, the advanced auxiliary of automobile drives or automatic Pilot needs realizing route to plan, just has to realize to travelable region Identification, as far as possible close to " perfection " perceive.

The weight parameter that the present invention is obtained by training network is loaded into the network structure saved, relatively clear clear It identifies the travelable region of intelligent vehicle in motion, and carries out can travel indicating for region with clearly color is compared, such as Shown in Fig. 4, black portions are two kinds and are based respectively on image, semantic cutting transformation schematic diagram of the invention, can also clearly obtain The travelable region of intelligent vehicle.

The present invention is above-mentioned more in the prior art in order to solve the problems, such as, proposes a kind of intelligent driving running car environment Under, the identification to travelable region is realized by binocular stereo vision.Binocular stereo vision figure is carried out by convolutional neural networks As feature extraction, the feature extracted then is mapped directly into whole picture by deconvolution neural network, realizes picture one by one The classification of element；Research is extended into binocular stereo vision from monocular vision image, to realize binocular stereo vision in image language The application of justice segmentation, i.e., increase space geometry feature (depth) on the basis of monocular image R, G, B feature, so that semantic segmentation Network has more fully feature that can learn；To convolutional neural networks and deconvolution nerve net based on improved Alexnet Semantic segmentation model composed by network is constantly improved and is optimized, and the accuracy and efficiency of model is improved；By above-mentioned semanteme Parted pattern is applied to vehicle and can travel in the identification in region, and carries out recognition effect verifying by the recognized standard data set, Segmentation precision is compared, model is advanced optimized, drives more accurate, the comprehensive perception data source provided for Vehicular intelligent.

The present invention is directed to combine traditional algorithm, improve and construct it is suitable, have a wide range of application, the good depth convolution of adaptability Neural network, and be applied to and divide specifically for the image, semantic in the advanced auxiliary driving of automobile or automatic Pilot field, It realizes that vehicle can travel the identification and detection in region, strives for more improving Vehicular automatic driving technology to the letter of environment sensing level Breath acquisition and analysis.The method being based on is the image realized under road scene in conjunction with binocular stereo vision and convolutional neural networks Semantic segmentation, more accurately identifying road by visual sensor data and deep learning algorithm can travel region, for intelligence The further decision of vehicle, control are of great significance.

Part Methods step of the invention and process may need to be executed by computer, thus with hardware, software, firmware and The mode of any combination thereof is implemented.

The above is only a preferred embodiment of the present invention, is not intended to limit the implementation and interest field of invention, all according to this Equivalence changes, modification, replacement that content described in patent application scope of patent protection is made etc. should all be included in the present patent application In the scope of the patents.Those skilled in the art will appreciate that without departing from the scope and spirit of the present invention, it can be wider It is changed and modified in wealthy various aspects.

Claims

1. a kind of road based on binocular stereo vision can travel area recognizing method, which is characterized in that comprising steps of

(1) the authentic data collection generally acknowledged to automatic Pilot field is acquired and is input to left and right list with pretreated image Mesh camera simultaneously acquires spatial depth figure by binocular stereo vision, and the spatial depth figure and original left camera RGB are schemed As the RGB-D image of synthesis binocular stereo vision, composing training, test and verification data set；

(2) training set is input to and semantic segmentation and continuous iteration instruction is carried out based on improved Alexnet shallow-layer network Practice, Optimal Parameters, reach scheduled model convergence state, final curing model parameter completes training as final language Adopted parted pattern, then test data is inputted into parted pattern, thus image of the output comprising can travel region；

(3) start and completed to install on intelligent vehicle, demarcated complete binocular stereo vision camera group, to traffic environment information Test image acquisition is carried out, binocular stereo vision image is obtained, being entered into trained model realization road can travel area The identification in domain.

2. the road described according to claim 1 based on binocular stereo vision can travel area recognizing method, which is characterized in that The step (1) specifically includes:

Stereoscopic vision matching is carried out by left and right monocular image and acquires disparity map, is acquired later according to system binocular vision model Spatial depth figure is finally superimposed upon on RGB image using spatial depth figure as fourth lane, forms RGB-D image.

3. the road described according to claim 1 based on binocular stereo vision can travel area recognizing method, which is characterized in that Encoder in the step (2), by construction convolutional neural networks as feature extraction, wherein to Alexnet shallow-layer network Improvement specifically include that

(4) identical convolutional layer is added with second layer convolutional layer between second layer convolutional layer and second layer pond layer, increases net The depth of network, so that the convolution number of plies of new network is 6, and the pond number of plies then remains unchanged.

4. the road described according to claim 1 based on binocular stereo vision can travel area recognizing method, which is characterized in that In step (2), the training process of the improved Alexnet shallow-layer network includes:

(1) the spatial depth figure and original left camera RGB image are synthesized into RGB-D image, is arranged as training number According to collection；

(2) with the training dataset for having label, training data is input to designed semantic segmentation network, picture passes through Several convolutional layers and pond layer carry out feature extraction；

(3) pond layer is up-sampled using deconvolution, search up-sampling is restored by whole image by deconvolution, is mapped To picture pixels rank, probabilistic forecasting is carried out to the travelable area pixel of intelligent vehicle；

5. the road described according to claim 1 based on binocular stereo vision can travel area recognizing method, which is characterized in that The authentic data collection generally acknowledged to automatic Pilot field is acquired and pretreatment specifically includes:

Training image and evaluation image are obtained with KITTI and CITYSCAPES data set；

Data enhancing processing is carried out to the training image and evaluation image, by symmetrical, overturning, translation and rotation process, is obtained The training dataset bigger to quantity, covering surface is more complete.

6. the road described according to claim 1 based on binocular stereo vision can travel area recognizing method, which is characterized in that

The semantic segmentation model of the improved Alexnet shallow-layer network includes: 12 convolutional layers, 3 pond layers, 3 warps Lamination (up-sampling) and one layer Softmax layers are used to prediction pixel probability.

7. the road according to claim 4 based on binocular stereo vision can travel area recognizing method, which is characterized in that It is specifically to utilize the method training of stochastic gradient descent defeated that the travelable area pixel to intelligent vehicle, which carries out probabilistic forecasting, The Softmax of layer returns layer out, until the loss function returned realizes scheduled convergence, reaches scheduled prediction segmentation identification effect Fruit.

8. the road described according to claim 1 based on binocular stereo vision can travel area recognizing method, which is characterized in that The travelable region specifically refers to all road areas for removing all barriers, allowing vehicle driving.

9. the road according to claim 8 based on binocular stereo vision can travel area recognizing method, which is characterized in that All road areas for allowing vehicle driving from physical structure level include structuring road surface, semi-structured road surface and non- Structuring road surface.

10. the road according to claim 9 based on binocular stereo vision can travel area recognizing method, feature exists In, its road edge of the road surface of structuring rule, the structure on road surface is single, neat, there is apparent lane line and other The carriage way of handmarking, structure sheaf are executed according to certain standard, and the color and material of surface layer keep unified；Described half The road surface of structuring refers to the road surface of general nonstandardized technique, and the color and material of top course have differences；It is described non-structural The road surface of change refers to the natural road scene of not structure sheaf.