AU2018102037A4

AU2018102037A4 - A method of recognition of vehicle type based on deep learning

Info

Publication number: AU2018102037A4
Application number: AU2018102037A
Authority: AU
Inventors: Jiahao Ge; Yu Han; Sitong Jin; Huancheng Song; Mingchen Yang; Hao Zeng
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-12-09
Filing date: 2018-12-09
Publication date: 2019-01-17
Anticipated expiration: 2026-12-09

Abstract

This patent is a specifically image recognized method based on deep learning. The image-based vehicle recognition system first captures the vehicle image from the video stream in camera, and then pattern recognition and computer vision are employed to attain the valuable information in order to carry out classification and recognition. This invention is mainly intended to preprocess images of various vehicles and then to divide all of the images into training set and testing set. The training set image is used for Deep Convolutional Neural Network training, and then the parameters between the layers of the Convolutional Neural Network are modified, which can be used in our testing set. After the training is completed, the neural network for testing is initialized with the parameters we get before to train a similar network as in training part. After that, the images for testing are putted into the Deep Neural Network to embody the identification and classification of the vehicle images. The invention does not require human participation and adjustment, and automatically performs the extraction and classification of features, providing a reliable high-performance image recognition technology based on deep learning. Standard Neural Net After Applying Dropout Figure 5

Description

TITLE

A method of recognition of vehicle type based on deep learning

FIELD OF THE INVENTION

The invention belongs to the technical field of digital image processing, and in particular relates to a vehicle image recognition method based on deep learning.

BACKGROUND OF THE INVENTION

With the improvement of living standard, car, as the most important vehicle in this world, is chosen to be utilized widely. The popularity of cars has brings great convenience to people's travel; However, various traffic problems have become increasingly prominent, seriously endangering people's travel safety. Under these circumstances, automatic identification and tracking of escaping vehicles through intelligent transportation systems are in urgent needs. The travel environment has become more complicated, and various types of traffic problems have been exposed to public. Cases such as accidents escaping, stealing vehicles, and loading fake license plates occur from time to time. Vehicle identification technology is an important tool in maintaining the stability of intelligent transportation system. It is a comprehensive application of design analysis, visual development, artificial intelligence, software development and other

2018102037 09 Dec 2018 technologies. Every year, crimes such as stealing vehicles and robbing seriously disturb the public security and cause serious economic losses. Because the car cases are generally hidden and the investigation work is huge, the clear-up rate of similar cases is not high. Facing with these problems, Chinese government has introduced some corresponding policies to relieve the situation. Hence, a productive recognition technology is indispensable.

Due to the lack of advanced technology, researchers must observe and select the useful massage by themselves, which also brings inconvenience to the public. In addition, they need to extract the figures, then use algorithm to recognize it. For these reasons, it is highly necessary to apply a method to promote the precision. One way to finish the job is to use the new research called deep learning, the advantage of which is to extract the character automatically. Also, with the help of the method, the celerity of recognition is ensured. In summary, this patent utilizes the technology of image recognition to identify vehicle type, further solving problems related with statistics, security and car tracking. Compared to other methods, this idea is not only full of efficiency but also is easy to control, which is worthy to be deeply explored.

2018102037 09 Dec 2018

SUMMARY OF THE INVENTION

In order to solve the deficiencies in modem technology, we propose a vehicle image recognition method based on deep learning, which adopts a multi-layer Convolutional Neural Network and a Full Connected Network serial connection method so that we are able to fully exploit the advantages of deep learning automatic feature extraction, effectively solving problems like difficulties of characters extraction and automatic, simultaneously identification.

The technical solution of the present invention is implemented as follows:

A vehicle image recognition method based on deep learning includes:

The construction of the vehicle model image dataset,

Parameter training and optimization of deep learning network,

The design of network structure of deep learning,

The real-time identification of deep learning network.

The construction of the vehicle model image dataset.

1. Using the crawler program to get enough pictures of different car categories on a specific website and classifying them according to

2018102037 09 Dec 2018 car, bus, truck, moto

2. Pre-processing image data, through image filtering, rotation, compression, tomat, load, etc., to convert image data of different sizes into image data of the same size and size, with the same number of color channels, as depth Learn the input of the network. The data of training and testing should be divided into appropriate proportions. In this invention, the uniform size is 32x32x1 and the ratio of training and testing is 4:1.

Parameter training and optimization of deep learning network

1. Putting data into the network in batched

The advantage of batching trained images is that it is easy for computers’ individual training, lest problems like system halted caused by one-time statistics’ training.

2. Optimizing Category

This invention applies Adam as the optimizing model. Adam is a first order algorithm that can replace traditional ‘Stochastic gradient descent’. It can update the weight of neural internet and pass it to the next generation. The difference between ‘Adam’ and ‘Stochastic gradient descent’ is that the latter one maintains identical study rate to renew all weights, and the former one sets different but appropriate study rate for the parameter by

2018102037 09 Dec 2018 calculating first and second moment grads. Also, Adam uses less capacity and its calculation is more effective and easy to achieve.

The design of network structure of deep learning

Figure 2 shows the Neural Network structure of the present invention.

This framework is divided into two parts:

Convolutional Neural Network and Fully Connected Neural Network.

1. Convolutional Neural Network:

Using structure like:

[[Convolution —>Relu]*M—> MaxPooling]* N

Through different sliding window filters preset in the Convolutional Neural Network, different feature point sets in each data source image are obtained, and a feature set pooling layer is formed as an input of the next layer of the Neural Network. In the pooling process, the combination function completes the maximum pooling, strengthens the general feature points, and reduces the amount of data to facilitate the calculation of feature points in the next layer network. In the

2018102037 09 Dec 2018

Convolutional layer, there are three main nodes: Convolution, Relu, and Maxpolling.

Convolution is a Convolution node. In this kind of node, Convolution operation on image data is completed by a preset number of N*N Convolution kernels, and different Convolution kernels are multiplied with corresponding M*M image pixels. Then, the summation results in different N*N Convolutions to obtain the acquisition of different features of the image, which is the feature value set of the image data. The values in the Convolution kernel are dynamically obtained during the learning training process. After completing a Convolution operation, the Convolution result set of M*M is taken as the input of the Relu node. The Relu node is the excitation layer, and the Convolution output result is nonlinearly mapped. The negative eigenvalue in the input N*N Convolution output result set is modified to 0 to achieve the purpose of enhancing the eigenvalue, and output N*N. The result set.

With Relu as the excitation function of the excitation layer, the convergence is fast, and the gradient is simple to speed up the calculation and reduce the computational difficulty.

2018102037 09 Dec 2018

The Maxpolling node is used as a pooling layer in a Convolutional Neural Network. The pooling layer is sandwiched between successive Convolutional layers to compress the amount of data and parameters and reduce overfitting. In short, if the input is an image, then the primary role of the pooling layer is to compress the image and reduce the latitude of the image data. Since the image data is still large after the Convolution operation is completed, in order to reduce the data dimension, downsampling is performed. The reason why this can be done is that even if a lot of data is reduced, the statistical properties of the feature can still describe the image, and because of the reduced data dimension, the over-fitting is effectively avoided. Data compression is performed using the Maxpolling maximum down sampling function while maintaining data characteristics. The input N*N data set can be compressed into a Y*Y data set (N>Y) by Maxpolling.

2. Fully connected Neural Network:

Using structure like:

[Full Connected —>Relu]*P —> Full Connected—> SoftMax

2018102037 09 Dec 2018

A fully connected Neural Network is a plurality of neurons connected in accordance with certain rules. Each neuron contains an activation function. The neurons are laid out in layers. The leftmost layer is called the input layer and is responsible for receiving input data; the rightmost layer is called the output layer, from which the Neural Network output data can be obtained. The layer between the input layer and the output layer is called the hidden layer, and each neuron in the Nth layer is connected to all the neurons in the N-l layer. The output of the N-l layer neuron is the input of the Nth layer neuron. Each connection has a weight. Compress and use The image data feature set of the Convolutional Neural Network as the input layer of the fully connected Neural Network, and obtain final output step by step through the different weights of each layer and the activation function of the neurons, which is the category indicating specific type of the vehicle. The connection weights between different neurons are obtained through the reverse conduction dynamic learning in the model training learning process. By training to improve the weights, the fully connected Neural Network model has the ability to identify data by eigenvalues. We also added an excitation layer

2018102037 09 Dec 2018 to the fully connected Neural Network, using the Relu excitation function to stimulate the output of the output layer of the fully connected Neural Network, clarify the output classification, and simplify the output. The SoftMax function has been added after the Relu activation layer. The SoftMax function is used to complete the probability distribution of the classification of data samples. The classification output result of the Relu active layer is used as an input of the SoftMax function, and the predicted label and the predicted classification probability of each data are output by calculation. Refine the data classification results.

The real-time identification of deep learning network

After the end of the model training, the present invention inputs the picture data for testing into the trained network for identification and classification. Print the accuracy of each batch of test results to the console, output the average accuracy and standard deviation of all test batches, and plot the confusion matrix of the test results, respectively showing the identification of the four types of vehicles during the test. Correct and identify as other types of quantities, and count the accuracy of each type.

2018102037 09 Dec 2018

The above recognition results can well reflect the level of model training. By comparing the accuracy of each type and identifying other types of probabilities, these results can effectively help us understand the shortcomings in data preparation data optimization.

The final accuracy of this experiment is 91%, and the specific results analysis will be shown in detail below.

DESCRIPTION OF THE DRAWINGS

Figure 1 Flowchart of the program

Figure 2 Neural Network structure

Figure 3 Full Connected Network structure

Figure 4 Flowchart of SoftMax

Figure 5 Flowchart of dropout operation

Figure 6 Sketch map of depth model

Figure 7 The structure diagram of training and testing

Figure 8-19 shows the result of the invention

Figure 8 The running results of the best parameters

Figure 9 Select the best number of iterations

Figure 10 Select the best number of iterations- line chart

Figure 11 Average accuracy under different iterations

Figure 12 Optimum decay rate parameter

Figure 13 Optimum decay rate parameter-line chart-1350 times

2018102037 09 Dec 2018

Figure 14 Optimum decay rate parameter-line chart-1400 times

Figure 15 Optimum decay rate parameter-line chart-1600 times

Figure 16 Optimum decay rate

Figure 17 Optimum decay rate-line chart-1350 times

Figure 18 Optimum decay rate-line chart-1400 times

Figure 19 Optimum decay rate-line chart-1600 times

DESCRIPTION OF PREFERRED EMBODIMENT

STEP1: Using the CrawPic.py crawler program in the python program to crawl four types of vehicle images (car, bus, truck, moto), each with 5000 images to ensure sufficient data volume , from http://image.baidu.com, namely,

STEP2: Using the PreprocessingPic.py program to perform data preprocessing on the images obtained from the network, select specific images suitable for the experiment, delete the ambiguous, unrelated ones, and only displays pictures of a small part of the car, leaving only clear pictures which are useful for the invention.

Because sizes and pixels of the downloaded image are different, we

2018102037 09 Dec 2018 need to pre-process the image, unify the size of the image, use the python Opencv library to process the image, cv2. resize() to convert the image to the final required size which is uniform and suitable as the input of Neural Network structure.

During the compression process, there may occur images with different sizes, which will seriously distort the picture, so pay special attention to the process of downloading and filtering. When the number of images is not enough, we can use cv2.getRotationMatrix2D() to rotate the image to the left or right by 5-15 angles to get a sufficient amount of data.

STEP3: Using the tomat. py program to label the image data according to its classification, and save it to the train.mat and test.mat files according to a certain ratio, so as to facilitate the reading of the data by the later program.

The four data types car, bus, truck, moto are numbered sequentially, which are 0, 1, 2, and 3, and store all the image data in the X array, label data in the y array

Using skleam. model selection in Python program to divide all image data and corresponding labels into 4 groups of train and test them according to the ratio of 4:1, and store them in train, mat and test.

2018102037 09 Dec 2018 mat.

STEP4: Using the load.py program to read the generated mat file and perform a series of operations on the acquired data to convert them into ones fit for our formats. In reformat method, use np.Transpose().Astype() to change the original shape data, which is modified to (the number of pictures, high-picture, the picture width, the channel number) format, and then labels are reformatted into one-hot encoding. (The label corresponding to 0 is [1, 0, 0, 0].)

In the normalize method, the data is grayed out and changed from a three-color channel to a monochrome channel to save memory and speed up the training.

Finally, each image data is used as an input to the deep learning network in the form of 32x32x1.

STEP5: Method to define the optimization parameter model, and use Adam as the optimization.

Define train batch size and test batch size to 64 and 500 respectively. That is, the amount of data for each batch of training and testing is 64 and 500, respectively.

STEP6: Extract Image features, using a Convolutional Neural

2018102037 09 Dec 2018

Network to prepare for a fully connected Neural Network. In this invention, 32x32x1 is used as an input, that is, input a 32x32 matrix, and the final output is 8x8x32.

In the feature extraction layer, including the [Convolution -> Relu]*M step, where M=2 is set,

Set parameters:

Filters=32: Each Convolution operation extracts 32 features to generate 3 2-layer output depth

Patch size=3: Set the size of the sliding window to 3x3

Padding=l: Add 1 layer 0 to the outermost layer of the result matrix after each Convolution operation, so that the size remains the same size.

Activation=’Relu’: set the activation function to Relu, set the value which is less than 0 in each result matrix to 0 to eliminate unnecessary influencing factors.

In the pooling layer, the MaxPooling operation is included to reduce the complexity of the feature.

Parameters:

Patch size=2: The window of MaxPooling is 2x2, and each time the pooling reduces the width and height of the matrix by half.

Set N=2, that is, perform a total of MxN=4 Convolution operations,

2018102037 09 Dec 2018 and perform a MaxPooling operation after each two Convolution operations.

STEP7: Relu is used as the activation function by utilize full connected layer to integrate Convolusitonal layer or information in the pooling layer. The output value of the last layer of the fully connected layer is passed to an output and sorted using SoftMax.

Figure 3 shows the structure of our Full Connected Network

In the present invention, we use 3 layers: one input layer, one hidden layer and one output layer. The number of nodes input is 8x8x32=2048. The final number of output nodes is 4, corresponding to the four categories of images. The value of P is set to 1, which means one hidden layer is contained, and the number of hidden layer nodes is set to 128. Therefore, the input 2048 nodes are reduced to 128 nodes in the hidden layer by the weight value, the bias and activation functions. Use these 128 nodes as the input of the output layer. In the output layer, 128 nodes are reduced to 4 nodes, then the remaining 4 nodes are calculated by SoftMax, and the output of the fully connected Neural Network is obtained.

So far, the design of the Neural Network has been completed. Figure

2018102037 09 Dec 2018 shows in detail the parameters ,inputs and outputs required for each layer in the Neural Network structure. Figure 7 shows the process of image data from input to output in a Neural Network, in which the data is displayed in the form of cuboids, the area of a cuboid represents the product of the length and width of the matrix of the nodes in the data, and the thickness represents the quantity of nodes in a set of data.

STEP8: After the training of the Neural Network model is completed and the results are obtained through testing, the parameters in the network structure need to be changed according to the feedback of the results. The parameters that need to be changed are:

Base learning rate, dropout rate, decay rate, and iteration steps Finally, we can get the best results by adjusting the above parameters.

Explanation

Relu

Relu is a sub-function: If the input is smaller or equal to zero, the output will be zero too, but if the input is greater than zero, the result will not changed. Formula: /(x)=max( 0, x). The formula will force some of the data be zero. Nevertheless, it improves the scarcity of the after- trained Internet, which is more consistent with the principle of

2018102037 09 Dec 2018 neuron excitation. This formula not only reduces the interdependence of parameter but also greatly alleviated the over-fitting problem, enhancing the functionalities of the Neural Network.

MaxPooling

Pooling operation is a basic procedure used in the Convolution Neural Network. Usually, there exists a pooling operation behind the Convolution layer, but nowadays the mainstream is to use MaxPooling. The pooling layer is to reduce the dimension of the characteristic in the filter layer, forming the ultimate classification result. There are two kinds of pooling: MaxPooling and Average Pooling. MaxPooling extracts certain characteristic values from a filter and saves the greatest pooling layer as the retention value, abandon the rest. Average- pooling is to average these values. Normally, man- pooling is used in Convolution Neural Network Because of the feature selection of MaxPooling, a better classification and identification rate is selected, which provides nonlinearity.

Full Connected Neural Network

A Full Connected Neural Network is a plurality of neurons connected according to certain rules. Each neuron contains an activation function, and the activation function is often chosen as a

2018102037 09 Dec 2018 sigmoid or than function.

In a Full Connected Neural Network, neurons are laid out in layers. The leftmost layer is called the input layer and it is responsible for receiving input data; the rightmost layer is called the output layer and it is used to obtain Neural Network output data. The layer between the input layer and the output layer is called a hidden layer, because the hidden layer is invisible. In a Full Connected Neural Network, there is no connection between neurons in the same layer. Each neuron in the Nth layer and all neurons in the N-l layer are connected to each other. The output of the N-lth layer neurons is the input of Nth layer neurons. The connections between neurons in different neural layers have different weights. The Neural Network is a function of f(x)=ax+b, where a represents the weight and b represents the offset. For calculating the output of the Neural Network based on the input, the value of each element x of the input vector X needs to be assigned to the corresponding neuron of the input layer of the Neural Network firstly, and then each neuron of each layer is sequentially calculated according to the activation function. The value is calculated until the values of all neurons in the last layer of the output layer are calculated. Finally, the values of each neuron in the output layer are stringed together to obtain the target output vector.

2018102037 09 Dec 2018

SoftMax

SoftMax is a normalized exponential function, an extend of logistic function. SoftMax has highly application in machine studying and deep learning, especially in dealing with problem like multiclassification. The final output unit in the classifier needs SoftMax function to process figures. SoftMax function is actually a normalization of finite term discrete probability distribution. So, SoftMax function has popular applications in issues related to multi-classification which are based on probability such as multinomial Logistic regression, multinomial linear discriminant analysis, simple Linear Bayesian Classifier, and artificial Neural Network. In Figure 4, left to the equality sign is how multi- connected layer’s work. W is the parameter in the Full Connected layer which we called the weight. X is the input of the Full Connected layer, which we called characteristic. There is a vector of characteristic ‘X=N*1”. This characteristic is resulted from the procession by multiple Convolutional layer and pooling layer in front of the Full Connected layer. Supposing that there is a Convolutional layer in front of the Full Connected layer, then the output of the Convolutional layer is 32 channel, meaning 32 characteristics and each one has a size of 8*8. W is the parameter of the Full Connection layer. It is a

2018102037 09 Dec 2018 matrix of T*N. This N corresponds to the N of X. T indicates the number of classes o For example, the classification of invention is 5, then T is 5. Therefore, training a network is the most suitable W matrix for the full link layer. So the Full Connected layer is the vector that performs W*X to get a T*l. On the definition of SoftMax function: the input of SoftMax is the vector of T*l, and the output is also a vector of T*l, each value of this vector represents the probability of the sample belonging t o each class, but the value of each vector of the output vector ranges from 0 to 1.

The input of SoftMax is W*X: assuming that the input sample of the model is I, discusses a 4 classification problem (Category 1, 2, 3). If the sample I is 3, then the sample I gets W*X before reaching the SoftMax level through the network layer. That is to say, W*X is a vector of 4* 1, and AJ in the formula indicates the No j in the vector of 4*1. Because the AK in the denominator represents the 4 values in the vector of 4*1, there will be a summation sign (here summation is K from 1 to T, T and the T in the graph above are equivalent, that is, and the j range is 1 to T). Because ex is always greater than 0, the numerator is always positive, and the denominator is positive, J is positive whose range is (0, 1). If it is not a training model, but a test

2018102037 09 Dec 2018 model, when a sample passes through the SoftMax layer and outputs a vector of Τ* 1, the index of the largest number in this vector will be taken as the prediction label of this sample.

Back Propagation

Iterative parameter training is performed for each batch of training data. In order to realize the training adjustment of parameter weights, a Back Propagation algorithm is selected. The back propagation algorithm calculates the δ of each node error term, based on the output result value y of each node and the targets result value Y firstly. After the error term is obtained, the connection weight of the node is updated, and the new weight is calculated according to the error term of the node, the learning rate, and the current weight. We use the iteration to calculate the weight of each node from the N layer to the N-l layer.

Si = yj (1 — yD Ekeoutputs ^wkj 5_k (2)

Wji +- Wji + pbjXji (3) wy The weight of the edge

Xji'. Input passed to j by i node δ: The error between the actual value of the node and the expected value

y. The output value of the node η: Learning rate

Regularized:

2018102037 09 Dec 2018

Over-fitting is common. The main reason is that the amount of data is not enough to support a model with high complexity. Therefore, the most straightforward solution to solve the over-fitting problem is to increase the amount of training data. In our experiment, the DNN is regularized by dropout.

Dropoutrate:

Figure 5 represents the flowchart of dropout operation

In each training batch, the over-fitting phenomenon is reduced by omitting a certain number of (dropout_rate) hidden layer nodes randomly, and the importance of some hidden layer nodes is highlighted for achieving the purpose of increasing the accuracy. Dropout can be regarded as a model averaging. The model averaging is to average the estimates from different models by a certain weight. In each batch of training process, because each hidden layer is ignored randomly, this makes the network of each training different, which can be used as a new model; Implicit nodes randomly appear with a certain probability, so there is no guarantee that every 2 hidden nodes appear at the same time; therefore, the update of weights no longer depends on the total number of the same effect of hidden nodes with fixed relationships.

Baselearningrate:

2018102037 09 Dec 2018

Learning rate (stride) controls the learning progress of the model. During the training process, the dynamically changing learning rate is generally set according to the number of training rounds. At the beginning of training, the learning rate is preferably 0.01-0.001. After a certain number of rounds, the rate gradually slows down. Finally near the end of training, the attenuation of that the learning rate should be more than 100 times. The initial learning rate chosen for this experiment is 0.001.

Decayrate:

According to the characteristic of different stages of the optimization process, a general idea is to use a large learning rate to accelerate convergence in the early stage, and to ensure stability with a small learning rate later. In this experiment, the step size and attenuation coefficient are empirical values. Learning rate mitigation mechanism: this experiment slows down the number of rounds, and the learning rate is reduced to 0.7 per 500 rounds.

Iteration steps:

Because of the number of iterations is too small, the machine depth learning is not enough. The number of iterations is too much (over-learning), making the learning effect unsatisfied.

2018102037 09 Dec 2018

RESULT ANALYSIS

The initial parameters of this experiment are preset to a learning rate of 0.0001, a decay rate of 0.9, a dropout rate of 0.9, and a number of iterations of 2000.

The best parameters of this experiment are learning rate 0.001, decay rate 0.7, dropout rate 0.9999, and iteration times 1350 times or 1400 times.

In Fig.9 and Fig. 10, we use the best parameters of this experiment to determine the optimal number of iterations. It can be seen from the table and line graph that the number of iterations is better at 1700 and 1400. Because the amount of the data is small, the optimal number of iterations cannot be accurately determined.

In order to carefully study the optimal parameters of the number of iterations, we do another experiment. Since the test set has reached 100% accuracy as early as 1000 times, in order to ensure that the model does not appear to be unsatisfied phenomenon, such as over-fitting, and ensure good recognition accuracy at the same time, we select 1300 times, 1350 times, 1400 times, 1450 times until 1800 times as the test object of this experiment, the number of iterations is 50 per iteration. Since the accuracy of machine learning is not certain, we measure 15 sets of data in each group and calculate the average accuracy of these 15 sets of data. As

2018102037 09 Dec 2018 shown in Fig. 11, it can be explained that the accuracy is quite high at 1350 times and 1400 times and 1600 times. So in the next few experiments, we used these three different times as test objects.

As shown in Fig. 12, 13, 14, and 15, in the fourth experiment, we prepare to change the learning rate first because the learning rate of the previous experiment was too low. As can be seen from the table above and the three line charts, when the learning rate is 0.001, the accuracy of machine learning is the highest, so we use 0.001 as the best learning rate.

As shown in Fig. 16, 17, 18, 19, in the three different iterations, when the decay rate is 0.7, module has the best accuracy.

Finally, we need to determine the optimal parameters of the dropout rate. Since the dropout is higher, the more nodes are removed, the more random our data is. Therefore, based on the initial dropout rate of 0.9, we choose 0.9999 as the best drop out rate. The parameters, in the end, also get more ideal results.

In conclusion, in the above multiple sets of comparative experiments, we finally determine the learning rate of 0.001, the decay rate of 0.7, the dropout rate of 0.9999, and the number of iterations of 1350 or 1400 times as the best parameters.

2018102037 09 Dec 2018

EDITORIAL NOTE

There is one page in the claims only .

Claims

1. A method of recognition of vehicle type based on deep learning , in which using structure

Input [[CONV -+ Relu] * M -+ MaxPooling] *N [FC -+ Relu] * P -> FC -> SoftMax as the Convolutional Neural Network construction mode and Full Connected Neural Network construction mode, Μ, N and P depends on the situation.

2. A method of recognition of vehicle type based on deep learning mentioned as claim 1, in which selection of model parameters: M=4 in the convolution, filters=32, size/strides = 2x2, Drop rate = 0. 99, Base learning rate = 0. 001, Decay rate = 0. 7, iteration steps = 1600, full Connected Neural Network: the node of hidden layer is node=128, the above parameters make the training results more accurate so that good model can be obtained.