AU2019100967A4 - An environment perception system for unmanned driving vehicles based on deep learning - Google Patents

An environment perception system for unmanned driving vehicles based on deep learning Download PDF

Info

Publication number
AU2019100967A4
AU2019100967A4 AU2019100967A AU2019100967A AU2019100967A4 AU 2019100967 A4 AU2019100967 A4 AU 2019100967A4 AU 2019100967 A AU2019100967 A AU 2019100967A AU 2019100967 A AU2019100967 A AU 2019100967A AU 2019100967 A4 AU2019100967 A4 AU 2019100967A4
Authority
AU
Australia
Prior art keywords
deep learning
perception system
training
environment perception
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2019100967A
Inventor
Fuming Jiang
Huifeng JIN
Shiwen Zhou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2019100967A priority Critical patent/AU2019100967A4/en
Application granted granted Critical
Publication of AU2019100967A4 publication Critical patent/AU2019100967A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

This application lies in the field of digital image processing, and it is an environment perception system for unmanned driving vehicles based on deep learning. Firstly, we need to preprocess the image data that needs to be specifically identified, which includes cars, trucks and motorcycles., and the processed image is divided into a training set and a test set. The training set is used to train the specific parameters of the neural network and save them. In the test, we can call these optimized parameters for identification. The invention has the following advantages. It needs no artificial participation to achieve environmental sensing purposes, providing a reliable, high-performance environment perception system based on deep learning.

Description

FIELD OF THE INVENTION
This invention is in the field of digital image processing and serves as classification of different types of vehicles powered by deep learning.
BACKGROUND OF THE INVENTION
The environmental perception of unmanned driving is the fundamental premise of the vehicle for navigation and positioning, road planning and motion control. It can sense the surrounding environment of the vehicle, construct local maps based on road lane information, vehicle position and status information and obstacle information obtained by the sensing system, plan local routes, and control the steering and speed of the vehicle in real time, so that the vehicle can drive safely and reliably on the road, which plays an important part on the safety and stability of unmanned driving.
The data processing of environment perception is mainly realized through deep learning. The concept of deep learning is derived from
2019100967 29 Aug 2019 the research of artificial neural network. It is a new field in machine learning research. The motivation of it is to establish a neural network that simulates human brain for analytical learning. By combining the low-level features to form the more abstract high-level representation attribute categories or features, in order to discover the distributed feature representations of data. It mimics the mechanisms of the human brain to interpret data such as images, sounds, and text [1], The early deep learning was mainly centered on the deep confidence network. The deep confidence network consisted of multiple restricted Boltzmann machines. It has different deep abstract feature extraction methods, namely the probability density distribution function through data learning [2], The probability of each category of the classification object is obtained by calculating the probability distribution function of the data
The deep confidence network was gradually replaced by a stack-based self-compiled network, the design of the self-encoder has a multi-layered artificially constructed neuron structure [3], As the name suggests, the application process includes an encoding process, which means it requires decoding, and a eigenvector can be obtained by decoding. Thus, the eigenvectors are composed of the above two structural layers in their multi-layer structure. The basic constituent cells are stack self-encoding networks with dimension reduction.
2019100967 29 Aug 2019
In 2012, Alex Krizhevsky et al. [4] applied the convolutional neural network to complete the image classification task in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and reduced the image classification error rate from 26% to 15%. The network is named Alexnet, which is a good proof of the effectiveness of CNN in complex models, and it has also attracted great attention from CNN. Since then, CNN has been widely used in the field of image recognition and image segmentation and has also begun to be applied to speech and other areas such as recognition and natural language processing, greatly promoting the development of deep learning.
Convolutional neural networks are more powerful than the first two networks in extracting data characteristics. The reason why the feature can be extracted effectively is mainly because of the use of the local perception method, a special visual mode, which only select a small part of the region of interest. The other one is weight sharing, which means the same parameters of the same kind of neurons. The last one is the magic weapon for down sampling, reducing the amount of data sharply. By incorporating the above three super-powers into the network structure, the performance of the network is greatly improved, not only to overcome the influence of displacement, but also exhibit superior performance when the size of the input image changes or the shape changes [5]
2019100967 29 Aug 2019
Sermanet [6] et al. applied deep learning to realize the recognition of traffic signs, through the Convolutional Neural Network (CNN) to learn the characteristics of traffic signs. Meanwhile, the efficiency of the algorithm was verified on the German Traffic Sign Recognition Benchmark (GTSRB) and the German Traffic Sign Detection Benchmark (GTSDB). Yang [7] et al. proposed a fast traffic sign recognition algorithm. The image features were preprocessed by applying traditional machine learning algorithms, and then the images were further classified by CNN, which reduced the computational complexity of the algorithm. Sun [8] et al. proposed a traffic identification algorithm based on extreme learning machine, which greatly reduced the complexity of the algorithm. At the same time, the effectiveness of the algorithm was verified in GTSRB.
Zhang Z et al [9] applied the deep learning convolutional neural network to the example segmentation of monocular vision and realized the segmentation of different objects in the actual scene and achieved good segmentation effect. John V et al. [10] apply the semantic segmentation based on deep learning to the actual driving scene, which can realize the segmentation of different objects at the same time. Audebert N et al. [11] simultaneously apply the semantic segmentation and object recognition technology based on deep learning to vehicle detection, which improves the robustness of the
2019100967 29 Aug 2019 whole system.
In this invention, we use TensorFlow, which is a tool that is widely used for machine learning applications, as the deep learning framework to implement the model. The environment- perception system we designed has its uniqueness. On the one hand, the data we use is collected and screened by ourselves. And we have selected the most optimized processing method after a lot of experiments. On the other hand, our architecture It is also designed and optimized by ourselves.
SUMMARY OF THE INVERNTION
The invention utilizes multi-layer Convolutional Neural Networks together with Fully-Connected Neural Networks to analyze and classify images based on TensorFlow. This method can exploit the advantages of automatic feature extraction to the full to perform a precise description to the feature in the image. This invention has improved the training precision and speed as well as making up the technical difficulties of over-fitting etc...
The whole process includes 5 steps, shown in figure 1
Data Collection
A total of M kinds of automated vehicles related data collected on the website, including trucks driven on the road, cars, etc., the total
2019100967 29 Aug 2019 number of the picture is N.
Data Preprocessing
The procedure of preprocessing is like figure2 □
First analyze and filter the collected data, deleting data less correlated with its category. Next, all the pictures are restricted to a fixed size, m*m pixels. The whole dataset is then proportionally divided into training set and testing set. After that, rearrange pictures together with their labels’ order and store them into a matrix format file through MATLAB. Before training, both image matrices from training and testing set should be reformatted to one-hot-encoding, transformed the color channels and normalized to a fixed range to simplify the calculations during the training and testing procedure.
Network Architecture
This part is to construct a network used for training and testing. The whole Convolutional Neural Networks architecture contains certain layers shown in figure3.
Convolutional layers are utilized to extract features from input images. In this layer, filters will slide through the image and divide it into several small regions. Filter matrices act as weights, together with bias are used as parameters to compute the pattern value with region matrices through flat dot production.
Following each convolutional layer and fully-connected layer, there is
2019100967 29 Aug 2019
ReLU acting as activation function, dealing with the input form the former layers.
Pooling layers’ aim is to divide the large special dimensional patterns received from convolutional layers into several smaller dimensional patterns through down sampling to avoid over-fitting.
Fully-connected layers sum all local characteristics together and get a global characteristic to perform classifications.
When an image matrix entering the network, it will first pass through W convolutional layers and ReLUs, where lower level features such as dots and lines can be caught, and then will be sent to a max pooling layer to decline the dimension of these features. The above procedure will be proceeded J times. Then through S fully-connected layers and following ReLUs, a global feature can be gathered to classify the input image to a corresponding class.
Structure Optimization
In this section, four methods, including regularization, dropout, optimizing learning rate and Adam algorithm, are invoked to avoid over-fitting and accelerate the training speed.
Regularization is added based on L2-LOSS formula, seen following formula /O55 = ^(y-/z(x,.)) +2-tL + 2-HIn the formula, is the actual value of i - th input x;, h is the predicted
2019100967 29 Aug 2019 value of i - th input x;, w is the weight value and b is the bias value of the current layer.xis a fixed parameter.
The summation part is the L2-LOSS function, compared with Ll-LOSS, its result is unique. Following are the weights and bias multiply a parameter, acting as regularization. The goal of it is to avoid over-fitting.
Dropout is a method used between pooling layer and fully-connected layer to randomly drop nodes in the network according to a fixed proportion during training to avoid over-fitting. At the test period, all the dropped nodes will be added to the network contributing to the prediction.
Learning rate optimization is to perform a fixed decay on a given learning rate based on training steps. Its aim is to reach the training’s converge state as quickly and precisely as possible.
Adam algorithm is used to update parameters in the deep learning network model. Compared with gradient descent algorithm and momentum update, Adam algorithm can achieve a better converge time under the same learning rate.
Train & Test
After structure optimization step, the training dataset will be sent to the network for training and test dataset will be used for checking if the generated model satisfies the requirement. If not, parameters,
2019100967 29 Aug 2019 including batch size, initial learning rate, decay rate etc., will be adjusted manually until a satisfiable model is formed.
DESCRIPTION OF DRAWINGS
Figurel illustrates the procedure of the invention;
Figure2 illustrates the procedure of data preprocessing;
Figure3 illustrates Convolutional Neural Network Abstract
Architecture;
Figure4 illustrates the CNN Detailed Architecture;
Tablel illustrates the Results of training & testing in the network.
DESCRIPTION OF PREFERRED EMBODIMENT
The total procedure of completing the model includes 5 steps, seen figurel, which will be detailed described in the following section.
Data Collection
We use images of three types of vehicles in MIO vision Traffic Camera Dataset (MIO-TCD), including cars, trucks and motorcycles. MIO-TCD is a dataset which consists of more than half a million images acquired at different times of day and different periods of the year by 8,000 traffic cameras deployed all over Canada and the United States. Those images have been selected to cover a wide range
2019100967 29 Aug 2019 of localization challenges and are representative of typical visual data captured today in urban traffic scenarios.
In order to get a better data distribution, we use data augmentation including rotation and zooming method, to make 5,000 images of each type of vehicle, 4000 for training and 1,000 for testing. Totally, the training set includes 12,000 images and the testing set includes 3,000 images.
Data Preprocessing
Firstly, for every image, we resize it to 32*32 pixels by bilinear interpolation with OPENCV. Then, with the aim of increasing the calculation speed, we change each image from three channels to a single channel. Specifically, we get the average of the values of the three channels and convert it into an interval between -1 to 1. Finally, we tag each image with one-hot encoding and store the data and labels into files through MATLAB.
Network Architecture
As the Figured, the network consists of four convolution layers followed by two fully connected layers. The input is an image of one channel and its height and width are both 32 pixels. For every convolution layer, the kernel size is 3*3 and it moves in stride of 1 in vertical and horizontal direction. Meanwhile, we pad the matrixes with zeros to keep the output images have the same size with the io
2019100967 29 Aug 2019 input images. We initialize the weights in normal distribution with standard deviation 0.1 and set the biases 0.1.
Each convolution layer outputs 32 feature maps and the results go through the nonlinear activation function RELU, which is a pixel-class operation. Showed in formula (1), RELU is used to alleviate gradient disappearing problems and speed up the convergence rate.
(, x, x>0
ReLU(x) = max(0, x) = / (I) ^0, x < 0 There is a max-pooling layer behind Convolutional Layer2 and Convolutional Layer4 respectively. The pooling is used to reduce the size of the feature maps and decrease the number of parameters in the network. In our work, we use filters of size 2*2 to down sample and the stride is 2. In max-pooling, the maximum among the four values in the filter will remain and the others will be discarded. Thus, the first max-pooling layer changes the image size from 32*32 (height*width) to 16* 16 and the second one changes the size from I6*l6to 8*8.
After the operation of Convolutional Layer4, we reshape the image matrix from 8*8*32 to 2048*1. For the first fully connected layer, the input nodes number is 2048 and the output nodes number is 128 and the results will still go through the activation function RELU. After the second fully connected layer, the output is the high-level feature
2019100967 29 Aug 2019 of the input image and it will be classified by softmax function, as formula (2), where y means the value of the i-th element. The output is the probability for each category and the sum is 1.
Softmax(y.) = — (2)
L>iexP(x/)
Structure Optimization
In optimization part, we adapt ADAM optimization algorithm to our invention, which can calculate independent adaptive learning rates for different parameters by calculating the first-order moment estimation and second-order moment estimation of the gradient. Meanwhile, we adapt L2 Loss function, showed in formula (3) to our model because it is very sensitive to the outliers in the dataset.
<3>
i=0
In the formula, is the actual value of z-th input xt, h is the predicted value of z-th input Xi.
To avoid over-fitting, we apply dropout method to the fully connected layer in the training stage. Dropout is a way discarding some nodes randomly with a given probability and prevents the model from overfitting. We set the dropout rate as 0.99 and get the best result. At the same time, we set the learning rate as 0.001 initially and it will decrease with a decay rate 0.99.
Train & Test
2019100967 29 Aug 2019
To mention that, the batch size is used during training period and testing batch size is fixed to 500, the accuracy is the testing accuracy and the learning rate is the initial learning rate pre-defined at the beginning of the training period.
Batch size is the number of images dealing parallelly to improve the training speed. Initial learning rate is a manually defined value which will vary automatically during the training period to achieve a better accuracy.
In the loop of training and testing period, the number of iteration steps has been increased to 2000 from 300, the initial learning rate has been increased from 0.001 to 0.01 and the dropout rate has been decreased from 0.99 to 0.8. Through changing the parameters of the network and comparing the test accuracy, a best match of parameters will be discovered, and the trained model through these parameters is best required.
From the table 1, it can be found that the best recognition model is when the number of iteration steps is 1000 and the initial learning rate is 0.001, which has the highest accuracy 93.267%.
□8 CN
REFERENCES [1] Sun Haotian. Application of Deep Learning in Unmanned
Vehicles[J], Computer Knowledge and
Technology,2015,11 (24) :121-123 [2] Fan Jialue, Xu Wei, Wu Ying et al. Human tracking using convolutional neural networks [J], IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 2010, 21(10).
[3] Xiang Zhan. Research on LeNet-5 Convolutional Neural Network Optimization Based on Particle Swarm Optimization[D], Huazhong University of Science and Technology, 2016.
[4] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks [C]//Intemational Conference on Neural Information Processing Systems. Curran
2019100967 29 Aug 2019
Associates Inc.2012: 1097-1105.
[5] Cong Bowen. Research on binocular perception method of vehicle driving three-dimensional environment based on deep leaming[D], Xi'an University of Science and Technology, 2018 [6] Senmanet P, LeCun Y. Traffic Sign Recognition With Multi· -Scale Convolutional Network[C], IEEE International Joint Conference on Neural Networks, 201 1:2809-2813.
[71] Yang Y, Luo H, Xu H, et al. Towards Real-Time Traffic Sign Detection And Classification[J], IEEE Transactions on Intelligent Transportation Systems, 2015:1-10.
[8] Sun Z L, Wang H, Lau W S, et al. Application Of BW-ELM Model On Traffic Sign Recognition[J], Neurocomputing, 2014,128(1):153-159.
[9] Zhang Z, Schwing A G, Fidler S, et al. Monocular Object Instance Segmentation and Depth Ordering with CNNs[C]//IEEE International Conference on Computer Vision. IEEE, 2015: 2614-2622.
[10] John V, Kidono K, Guo C, et al. Fast road scene segmentation using deep learning and scene-based models[C]//Pattem Recognition (ICPR), 2016 23rd International Conference on Pattern Recognition. IEEE, 2016: 3763-3768.
[11] Audebert N, Le Saux B, Lefevre S. Segment-before-detect: Vehicle detection and classification through semantic segmentation of
2019100967 29 Aug 2019 aerial images[J], Remote Sensing, 2017, 9(4): 368.

Claims (1)

1. An environment perception system for unmanned driving vehicles based on deep learning, using a small size image detection, characterized in:
the method can achieve 93.267% accuracy in 1000 iterations, detecting 32 * 32 pixels images, wherein through the i7-6700HQ CPU, the total running time of training 12000 images together with testing 3000 images is within 10 minutes.
AU2019100967A 2019-08-29 2019-08-29 An environment perception system for unmanned driving vehicles based on deep learning Ceased AU2019100967A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2019100967A AU2019100967A4 (en) 2019-08-29 2019-08-29 An environment perception system for unmanned driving vehicles based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2019100967A AU2019100967A4 (en) 2019-08-29 2019-08-29 An environment perception system for unmanned driving vehicles based on deep learning

Publications (1)

Publication Number Publication Date
AU2019100967A4 true AU2019100967A4 (en) 2019-10-03

Family

ID=68063090

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2019100967A Ceased AU2019100967A4 (en) 2019-08-29 2019-08-29 An environment perception system for unmanned driving vehicles based on deep learning

Country Status (1)

Country Link
AU (1) AU2019100967A4 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509017A (en) * 2020-11-18 2021-03-16 西北工业大学 Remote sensing image change detection method based on learnable difference algorithm
CN114782907A (en) * 2022-03-29 2022-07-22 智道网联科技(北京)有限公司 Unmanned vehicle driving environment recognition method, device, equipment and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509017A (en) * 2020-11-18 2021-03-16 西北工业大学 Remote sensing image change detection method based on learnable difference algorithm
CN112509017B (en) * 2020-11-18 2024-06-28 西北工业大学 Remote sensing image change detection method based on learnable differential algorithm
CN114782907A (en) * 2022-03-29 2022-07-22 智道网联科技(北京)有限公司 Unmanned vehicle driving environment recognition method, device, equipment and computer readable storage medium
CN114782907B (en) * 2022-03-29 2024-07-26 智道网联科技(北京)有限公司 Unmanned vehicle driving environment recognition method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
JP7289918B2 (en) Object recognition method and device
AU2019101133A4 (en) Fast vehicle detection using augmented dataset based on RetinaNet
CN110163187B (en) F-RCNN-based remote traffic sign detection and identification method
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
WO2021147325A1 (en) Object detection method and apparatus, and storage medium
CN111401517B (en) Method and device for searching perceived network structure
CN111460919B (en) Monocular vision road target detection and distance estimation method based on improved YOLOv3
Haider et al. Human detection in aerial thermal imaging using a fully convolutional regression network
Nguyen et al. Hybrid deep learning-Gaussian process network for pedestrian lane detection in unstructured scenes
US20230070439A1 (en) Managing occlusion in siamese tracking using structured dropouts
CN112417973A (en) Unmanned system based on car networking
Behera et al. Superpixel-based multiscale CNN approach toward multiclass object segmentation from UAV-captured aerial images
AU2019100967A4 (en) An environment perception system for unmanned driving vehicles based on deep learning
CN117157679A (en) Perception network, training method of perception network, object recognition method and device
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
Rahman et al. Real-Time Object Detection using Machine Learning
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction
CN116863260A (en) Data processing method and device
Chen et al. Research on object detection algorithm based on multilayer information fusion
CN112805723B (en) Image processing system and method and automatic driving vehicle comprising system
He et al. Real-time pedestrian warning system on highway using deep learning methods
CN113313091B (en) Density estimation method based on multiple attention and topological constraints under warehouse logistics
Jabeen et al. Weather classification on roads for drivers assistance using deep transferred features
Wang et al. An Improved Deeplabv3+ Model for Semantic Segmentation of Urban Environments Targeting Autonomous Driving.
CN115115016A (en) Method and device for training neural network

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry