AU2018102037A4 - A method of recognition of vehicle type based on deep learning - Google Patents
A method of recognition of vehicle type based on deep learning Download PDFInfo
- Publication number
- AU2018102037A4 AU2018102037A4 AU2018102037A AU2018102037A AU2018102037A4 AU 2018102037 A4 AU2018102037 A4 AU 2018102037A4 AU 2018102037 A AU2018102037 A AU 2018102037A AU 2018102037 A AU2018102037 A AU 2018102037A AU 2018102037 A4 AU2018102037 A4 AU 2018102037A4
- Authority
- AU
- Australia
- Prior art keywords
- layer
- neural network
- training
- image
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013135 deep learning Methods 0.000 title claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000013528 artificial neural network Methods 0.000 claims abstract description 32
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 9
- 238000010276 construction Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 abstract description 22
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000000605 extraction Methods 0.000 abstract description 4
- 230000001537 neural effect Effects 0.000 abstract description 4
- 238000003909 pattern recognition Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 23
- 210000002569 neuron Anatomy 0.000 description 23
- 238000011176 pooling Methods 0.000 description 18
- 238000002474 experimental method Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 11
- 230000004913 activation Effects 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 8
- 230000005284 excitation Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000007630 basic procedure Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
This patent is a specifically image recognized method based on deep learning. The image-based vehicle recognition system first captures the vehicle image from the video stream in camera, and then pattern recognition and computer vision are employed to attain the valuable information in order to carry out classification and recognition. This invention is mainly intended to preprocess images of various vehicles and then to divide all of the images into training set and testing set. The training set image is used for Deep Convolutional Neural Network training, and then the parameters between the layers of the Convolutional Neural Network are modified, which can be used in our testing set. After the training is completed, the neural network for testing is initialized with the parameters we get before to train a similar network as in training part. After that, the images for testing are putted into the Deep Neural Network to embody the identification and classification of the vehicle images. The invention does not require human participation and adjustment, and automatically performs the extraction and classification of features, providing a reliable high-performance image recognition technology based on deep learning. Standard Neural Net After Applying Dropout Figure 5
Description
TITLE
A method of recognition of vehicle type based on deep learning
FIELD OF THE INVENTION
The invention belongs to the technical field of digital image processing, and in particular relates to a vehicle image recognition method based on deep learning.
BACKGROUND OF THE INVENTION
With the improvement of living standard, car, as the most important vehicle in this world, is chosen to be utilized widely. The popularity of cars has brings great convenience to people's travel; However, various traffic problems have become increasingly prominent, seriously endangering people's travel safety. Under these circumstances, automatic identification and tracking of escaping vehicles through intelligent transportation systems are in urgent needs. The travel environment has become more complicated, and various types of traffic problems have been exposed to public. Cases such as accidents escaping, stealing vehicles, and loading fake license plates occur from time to time. Vehicle identification technology is an important tool in maintaining the stability of intelligent transportation system. It is a comprehensive application of design analysis, visual development, artificial intelligence, software development and other
2018102037 09 Dec 2018 technologies. Every year, crimes such as stealing vehicles and robbing seriously disturb the public security and cause serious economic losses. Because the car cases are generally hidden and the investigation work is huge, the clear-up rate of similar cases is not high. Facing with these problems, Chinese government has introduced some corresponding policies to relieve the situation. Hence, a productive recognition technology is indispensable.
Due to the lack of advanced technology, researchers must observe and select the useful massage by themselves, which also brings inconvenience to the public. In addition, they need to extract the figures, then use algorithm to recognize it. For these reasons, it is highly necessary to apply a method to promote the precision. One way to finish the job is to use the new research called deep learning, the advantage of which is to extract the character automatically. Also, with the help of the method, the celerity of recognition is ensured. In summary, this patent utilizes the technology of image recognition to identify vehicle type, further solving problems related with statistics, security and car tracking. Compared to other methods, this idea is not only full of efficiency but also is easy to control, which is worthy to be deeply explored.
2018102037 09 Dec 2018
SUMMARY OF THE INVENTION
In order to solve the deficiencies in modem technology, we propose a vehicle image recognition method based on deep learning, which adopts a multi-layer Convolutional Neural Network and a Full Connected Network serial connection method so that we are able to fully exploit the advantages of deep learning automatic feature extraction, effectively solving problems like difficulties of characters extraction and automatic, simultaneously identification.
The technical solution of the present invention is implemented as follows:
A vehicle image recognition method based on deep learning includes:
The construction of the vehicle model image dataset,
Parameter training and optimization of deep learning network,
The design of network structure of deep learning,
The real-time identification of deep learning network.
The construction of the vehicle model image dataset.
1. Using the crawler program to get enough pictures of different car categories on a specific website and classifying them according to
2018102037 09 Dec 2018 car, bus, truck, moto
2. Pre-processing image data, through image filtering, rotation, compression, tomat, load, etc., to convert image data of different sizes into image data of the same size and size, with the same number of color channels, as depth Learn the input of the network. The data of training and testing should be divided into appropriate proportions. In this invention, the uniform size is 32x32x1 and the ratio of training and testing is 4:1.
Parameter training and optimization of deep learning network
1. Putting data into the network in batched
The advantage of batching trained images is that it is easy for computers’ individual training, lest problems like system halted caused by one-time statistics’ training.
2. Optimizing Category
This invention applies Adam as the optimizing model. Adam is a first order algorithm that can replace traditional ‘Stochastic gradient descent’. It can update the weight of neural internet and pass it to the next generation. The difference between ‘Adam’ and ‘Stochastic gradient descent’ is that the latter one maintains identical study rate to renew all weights, and the former one sets different but appropriate study rate for the parameter by
2018102037 09 Dec 2018 calculating first and second moment grads. Also, Adam uses less capacity and its calculation is more effective and easy to achieve.
The design of network structure of deep learning
Figure 2 shows the Neural Network structure of the present invention.
This framework is divided into two parts:
Convolutional Neural Network and Fully Connected Neural Network.
1. Convolutional Neural Network:
Using structure like:
[[Convolution —>Relu]*M—> MaxPooling]* N
Through different sliding window filters preset in the Convolutional Neural Network, different feature point sets in each data source image are obtained, and a feature set pooling layer is formed as an input of the next layer of the Neural Network. In the pooling process, the combination function completes the maximum pooling, strengthens the general feature points, and reduces the amount of data to facilitate the calculation of feature points in the next layer network. In the
2018102037 09 Dec 2018
Convolutional layer, there are three main nodes: Convolution, Relu, and Maxpolling.
Convolution is a Convolution node. In this kind of node, Convolution operation on image data is completed by a preset number of N*N Convolution kernels, and different Convolution kernels are multiplied with corresponding M*M image pixels. Then, the summation results in different N*N Convolutions to obtain the acquisition of different features of the image, which is the feature value set of the image data. The values in the Convolution kernel are dynamically obtained during the learning training process. After completing a Convolution operation, the Convolution result set of M*M is taken as the input of the Relu node. The Relu node is the excitation layer, and the Convolution output result is nonlinearly mapped. The negative eigenvalue in the input N*N Convolution output result set is modified to 0 to achieve the purpose of enhancing the eigenvalue, and output N*N. The result set.
With Relu as the excitation function of the excitation layer, the convergence is fast, and the gradient is simple to speed up the calculation and reduce the computational difficulty.
2018102037 09 Dec 2018
The Maxpolling node is used as a pooling layer in a Convolutional Neural Network. The pooling layer is sandwiched between successive Convolutional layers to compress the amount of data and parameters and reduce overfitting. In short, if the input is an image, then the primary role of the pooling layer is to compress the image and reduce the latitude of the image data. Since the image data is still large after the Convolution operation is completed, in order to reduce the data dimension, downsampling is performed. The reason why this can be done is that even if a lot of data is reduced, the statistical properties of the feature can still describe the image, and because of the reduced data dimension, the over-fitting is effectively avoided. Data compression is performed using the Maxpolling maximum down sampling function while maintaining data characteristics. The input N*N data set can be compressed into a Y*Y data set (N>Y) by Maxpolling.
2. Fully connected Neural Network:
Using structure like:
[Full Connected —>Relu]*P —> Full Connected—> SoftMax
2018102037 09 Dec 2018
A fully connected Neural Network is a plurality of neurons connected in accordance with certain rules. Each neuron contains an activation function. The neurons are laid out in layers. The leftmost layer is called the input layer and is responsible for receiving input data; the rightmost layer is called the output layer, from which the Neural Network output data can be obtained. The layer between the input layer and the output layer is called the hidden layer, and each neuron in the Nth layer is connected to all the neurons in the N-l layer. The output of the N-l layer neuron is the input of the Nth layer neuron. Each connection has a weight. Compress and use The image data feature set of the Convolutional Neural Network as the input layer of the fully connected Neural Network, and obtain final output step by step through the different weights of each layer and the activation function of the neurons, which is the category indicating specific type of the vehicle. The connection weights between different neurons are obtained through the reverse conduction dynamic learning in the model training learning process. By training to improve the weights, the fully connected Neural Network model has the ability to identify data by eigenvalues. We also added an excitation layer
2018102037 09 Dec 2018 to the fully connected Neural Network, using the Relu excitation function to stimulate the output of the output layer of the fully connected Neural Network, clarify the output classification, and simplify the output. The SoftMax function has been added after the Relu activation layer. The SoftMax function is used to complete the probability distribution of the classification of data samples. The classification output result of the Relu active layer is used as an input of the SoftMax function, and the predicted label and the predicted classification probability of each data are output by calculation. Refine the data classification results.
The real-time identification of deep learning network
After the end of the model training, the present invention inputs the picture data for testing into the trained network for identification and classification. Print the accuracy of each batch of test results to the console, output the average accuracy and standard deviation of all test batches, and plot the confusion matrix of the test results, respectively showing the identification of the four types of vehicles during the test. Correct and identify as other types of quantities, and count the accuracy of each type.
2018102037 09 Dec 2018
The above recognition results can well reflect the level of model training. By comparing the accuracy of each type and identifying other types of probabilities, these results can effectively help us understand the shortcomings in data preparation data optimization.
The final accuracy of this experiment is 91%, and the specific results analysis will be shown in detail below.
DESCRIPTION OF THE DRAWINGS
Figure 1 Flowchart of the program
Figure 2 Neural Network structure
Figure 3 Full Connected Network structure
Figure 4 Flowchart of SoftMax
Figure 5 Flowchart of dropout operation
Figure 6 Sketch map of depth model
Figure 7 The structure diagram of training and testing
Figure 8-19 shows the result of the invention
Figure 8 The running results of the best parameters
Figure 9 Select the best number of iterations
Figure 10 Select the best number of iterations- line chart
Figure 11 Average accuracy under different iterations
Figure 12 Optimum decay rate parameter
Figure 13 Optimum decay rate parameter-line chart-1350 times
2018102037 09 Dec 2018
Figure 14 Optimum decay rate parameter-line chart-1400 times
Figure 15 Optimum decay rate parameter-line chart-1600 times
Figure 16 Optimum decay rate
Figure 17 Optimum decay rate-line chart-1350 times
Figure 18 Optimum decay rate-line chart-1400 times
Figure 19 Optimum decay rate-line chart-1600 times
DESCRIPTION OF PREFERRED EMBODIMENT
STEP1: Using the CrawPic.py crawler program in the python program to crawl four types of vehicle images (car, bus, truck, moto), each with 5000 images to ensure sufficient data volume , from http://image.baidu.com, namely,
STEP2: Using the PreprocessingPic.py program to perform data preprocessing on the images obtained from the network, select specific images suitable for the experiment, delete the ambiguous, unrelated ones, and only displays pictures of a small part of the car, leaving only clear pictures which are useful for the invention.
Because sizes and pixels of the downloaded image are different, we
2018102037 09 Dec 2018 need to pre-process the image, unify the size of the image, use the python Opencv library to process the image, cv2. resize() to convert the image to the final required size which is uniform and suitable as the input of Neural Network structure.
During the compression process, there may occur images with different sizes, which will seriously distort the picture, so pay special attention to the process of downloading and filtering. When the number of images is not enough, we can use cv2.getRotationMatrix2D() to rotate the image to the left or right by 5-15 angles to get a sufficient amount of data.
STEP3: Using the tomat. py program to label the image data according to its classification, and save it to the train.mat and test.mat files according to a certain ratio, so as to facilitate the reading of the data by the later program.
The four data types car, bus, truck, moto are numbered sequentially, which are 0, 1, 2, and 3, and store all the image data in the X array, label data in the y array
Using skleam. model selection in Python program to divide all image data and corresponding labels into 4 groups of train and test them according to the ratio of 4:1, and store them in train, mat and test.
2018102037 09 Dec 2018 mat.
STEP4: Using the load.py program to read the generated mat file and perform a series of operations on the acquired data to convert them into ones fit for our formats. In reformat method, use np.Transpose().Astype() to change the original shape data, which is modified to (the number of pictures, high-picture, the picture width, the channel number) format, and then labels are reformatted into one-hot encoding. (The label corresponding to 0 is [1, 0, 0, 0].)
In the normalize method, the data is grayed out and changed from a three-color channel to a monochrome channel to save memory and speed up the training.
Finally, each image data is used as an input to the deep learning network in the form of 32x32x1.
STEP5: Method to define the optimization parameter model, and use Adam as the optimization.
Define train batch size and test batch size to 64 and 500 respectively. That is, the amount of data for each batch of training and testing is 64 and 500, respectively.
STEP6: Extract Image features, using a Convolutional Neural
2018102037 09 Dec 2018
Network to prepare for a fully connected Neural Network. In this invention, 32x32x1 is used as an input, that is, input a 32x32 matrix, and the final output is 8x8x32.
In the feature extraction layer, including the [Convolution -> Relu]*M step, where M=2 is set,
Set parameters:
Filters=32: Each Convolution operation extracts 32 features to generate 3 2-layer output depth
Patch size=3: Set the size of the sliding window to 3x3
Padding=l: Add 1 layer 0 to the outermost layer of the result matrix after each Convolution operation, so that the size remains the same size.
Activation=’Relu’: set the activation function to Relu, set the value which is less than 0 in each result matrix to 0 to eliminate unnecessary influencing factors.
In the pooling layer, the MaxPooling operation is included to reduce the complexity of the feature.
Parameters:
Patch size=2: The window of MaxPooling is 2x2, and each time the pooling reduces the width and height of the matrix by half.
Set N=2, that is, perform a total of MxN=4 Convolution operations,
2018102037 09 Dec 2018 and perform a MaxPooling operation after each two Convolution operations.
STEP7: Relu is used as the activation function by utilize full connected layer to integrate Convolusitonal layer or information in the pooling layer. The output value of the last layer of the fully connected layer is passed to an output and sorted using SoftMax.
Figure 3 shows the structure of our Full Connected Network
In the present invention, we use 3 layers: one input layer, one hidden layer and one output layer. The number of nodes input is 8x8x32=2048. The final number of output nodes is 4, corresponding to the four categories of images. The value of P is set to 1, which means one hidden layer is contained, and the number of hidden layer nodes is set to 128. Therefore, the input 2048 nodes are reduced to 128 nodes in the hidden layer by the weight value, the bias and activation functions. Use these 128 nodes as the input of the output layer. In the output layer, 128 nodes are reduced to 4 nodes, then the remaining 4 nodes are calculated by SoftMax, and the output of the fully connected Neural Network is obtained.
So far, the design of the Neural Network has been completed. Figure
2018102037 09 Dec 2018 shows in detail the parameters ,inputs and outputs required for each layer in the Neural Network structure. Figure 7 shows the process of image data from input to output in a Neural Network, in which the data is displayed in the form of cuboids, the area of a cuboid represents the product of the length and width of the matrix of the nodes in the data, and the thickness represents the quantity of nodes in a set of data.
STEP8: After the training of the Neural Network model is completed and the results are obtained through testing, the parameters in the network structure need to be changed according to the feedback of the results. The parameters that need to be changed are:
Base learning rate, dropout rate, decay rate, and iteration steps Finally, we can get the best results by adjusting the above parameters.
Explanation
Relu
Relu is a sub-function: If the input is smaller or equal to zero, the output will be zero too, but if the input is greater than zero, the result will not changed. Formula: /(x)=max( 0, x). The formula will force some of the data be zero. Nevertheless, it improves the scarcity of the after- trained Internet, which is more consistent with the principle of
2018102037 09 Dec 2018 neuron excitation. This formula not only reduces the interdependence of parameter but also greatly alleviated the over-fitting problem, enhancing the functionalities of the Neural Network.
MaxPooling
Pooling operation is a basic procedure used in the Convolution Neural Network. Usually, there exists a pooling operation behind the Convolution layer, but nowadays the mainstream is to use MaxPooling. The pooling layer is to reduce the dimension of the characteristic in the filter layer, forming the ultimate classification result. There are two kinds of pooling: MaxPooling and Average Pooling. MaxPooling extracts certain characteristic values from a filter and saves the greatest pooling layer as the retention value, abandon the rest. Average- pooling is to average these values. Normally, man- pooling is used in Convolution Neural Network Because of the feature selection of MaxPooling, a better classification and identification rate is selected, which provides nonlinearity.
Full Connected Neural Network
A Full Connected Neural Network is a plurality of neurons connected according to certain rules. Each neuron contains an activation function, and the activation function is often chosen as a
2018102037 09 Dec 2018 sigmoid or than function.
In a Full Connected Neural Network, neurons are laid out in layers. The leftmost layer is called the input layer and it is responsible for receiving input data; the rightmost layer is called the output layer and it is used to obtain Neural Network output data. The layer between the input layer and the output layer is called a hidden layer, because the hidden layer is invisible. In a Full Connected Neural Network, there is no connection between neurons in the same layer. Each neuron in the Nth layer and all neurons in the N-l layer are connected to each other. The output of the N-lth layer neurons is the input of Nth layer neurons. The connections between neurons in different neural layers have different weights. The Neural Network is a function of f(x)=ax+b, where a represents the weight and b represents the offset. For calculating the output of the Neural Network based on the input, the value of each element x of the input vector X needs to be assigned to the corresponding neuron of the input layer of the Neural Network firstly, and then each neuron of each layer is sequentially calculated according to the activation function. The value is calculated until the values of all neurons in the last layer of the output layer are calculated. Finally, the values of each neuron in the output layer are stringed together to obtain the target output vector.
2018102037 09 Dec 2018
SoftMax
SoftMax is a normalized exponential function, an extend of logistic function. SoftMax has highly application in machine studying and deep learning, especially in dealing with problem like multiclassification. The final output unit in the classifier needs SoftMax function to process figures. SoftMax function is actually a normalization of finite term discrete probability distribution. So, SoftMax function has popular applications in issues related to multi-classification which are based on probability such as multinomial Logistic regression, multinomial linear discriminant analysis, simple Linear Bayesian Classifier, and artificial Neural Network. In Figure 4, left to the equality sign is how multi- connected layer’s work. W is the parameter in the Full Connected layer which we called the weight. X is the input of the Full Connected layer, which we called characteristic. There is a vector of characteristic ‘X=N*1”. This characteristic is resulted from the procession by multiple Convolutional layer and pooling layer in front of the Full Connected layer. Supposing that there is a Convolutional layer in front of the Full Connected layer, then the output of the Convolutional layer is 32 channel, meaning 32 characteristics and each one has a size of 8*8. W is the parameter of the Full Connection layer. It is a
2018102037 09 Dec 2018 matrix of T*N. This N corresponds to the N of X. T indicates the number of classes o For example, the classification of invention is 5, then T is 5. Therefore, training a network is the most suitable W matrix for the full link layer. So the Full Connected layer is the vector that performs W*X to get a T*l. On the definition of SoftMax function: the input of SoftMax is the vector of T*l, and the output is also a vector of T*l, each value of this vector represents the probability of the sample belonging t o each class, but the value of each vector of the output vector ranges from 0 to 1.
The input of SoftMax is W*X: assuming that the input sample of the model is I, discusses a 4 classification problem (Category 1, 2, 3). If the sample I is 3, then the sample I gets W*X before reaching the SoftMax level through the network layer. That is to say, W*X is a vector of 4* 1, and AJ in the formula indicates the No j in the vector of 4*1. Because the AK in the denominator represents the 4 values in the vector of 4*1, there will be a summation sign (here summation is K from 1 to T, T and the T in the graph above are equivalent, that is, and the j range is 1 to T). Because ex is always greater than 0, the numerator is always positive, and the denominator is positive, J is positive whose range is (0, 1). If it is not a training model, but a test
2018102037 09 Dec 2018 model, when a sample passes through the SoftMax layer and outputs a vector of Τ* 1, the index of the largest number in this vector will be taken as the prediction label of this sample.
Back Propagation
Iterative parameter training is performed for each batch of training data. In order to realize the training adjustment of parameter weights, a Back Propagation algorithm is selected. The back propagation algorithm calculates the δ of each node error term, based on the output result value y of each node and the targets result value Y firstly. After the error term is obtained, the connection weight of the node is updated, and the new weight is calculated according to the error term of the node, the learning rate, and the current weight. We use the iteration to calculate the weight of each node from the N layer to the N-l layer.
Si = yj (1 — yD Ekeoutputs wkj 5k (2)
Wji +- Wji + pbjXji (3) wy The weight of the edge
Xji'. Input passed to j by i node δ: The error between the actual value of the node and the expected value
y. The output value of the node η: Learning rate
Regularized:
2018102037 09 Dec 2018
Over-fitting is common. The main reason is that the amount of data is not enough to support a model with high complexity. Therefore, the most straightforward solution to solve the over-fitting problem is to increase the amount of training data. In our experiment, the DNN is regularized by dropout.
Dropoutrate:
Figure 5 represents the flowchart of dropout operation
In each training batch, the over-fitting phenomenon is reduced by omitting a certain number of (dropout_rate) hidden layer nodes randomly, and the importance of some hidden layer nodes is highlighted for achieving the purpose of increasing the accuracy. Dropout can be regarded as a model averaging. The model averaging is to average the estimates from different models by a certain weight. In each batch of training process, because each hidden layer is ignored randomly, this makes the network of each training different, which can be used as a new model; Implicit nodes randomly appear with a certain probability, so there is no guarantee that every 2 hidden nodes appear at the same time; therefore, the update of weights no longer depends on the total number of the same effect of hidden nodes with fixed relationships.
Baselearningrate:
2018102037 09 Dec 2018
Learning rate (stride) controls the learning progress of the model. During the training process, the dynamically changing learning rate is generally set according to the number of training rounds. At the beginning of training, the learning rate is preferably 0.01-0.001. After a certain number of rounds, the rate gradually slows down. Finally near the end of training, the attenuation of that the learning rate should be more than 100 times. The initial learning rate chosen for this experiment is 0.001.
Decayrate:
According to the characteristic of different stages of the optimization process, a general idea is to use a large learning rate to accelerate convergence in the early stage, and to ensure stability with a small learning rate later. In this experiment, the step size and attenuation coefficient are empirical values. Learning rate mitigation mechanism: this experiment slows down the number of rounds, and the learning rate is reduced to 0.7 per 500 rounds.
Iteration steps:
Because of the number of iterations is too small, the machine depth learning is not enough. The number of iterations is too much (over-learning), making the learning effect unsatisfied.
2018102037 09 Dec 2018
RESULT ANALYSIS
The initial parameters of this experiment are preset to a learning rate of 0.0001, a decay rate of 0.9, a dropout rate of 0.9, and a number of iterations of 2000.
The best parameters of this experiment are learning rate 0.001, decay rate 0.7, dropout rate 0.9999, and iteration times 1350 times or 1400 times.
In Fig.9 and Fig. 10, we use the best parameters of this experiment to determine the optimal number of iterations. It can be seen from the table and line graph that the number of iterations is better at 1700 and 1400. Because the amount of the data is small, the optimal number of iterations cannot be accurately determined.
In order to carefully study the optimal parameters of the number of iterations, we do another experiment. Since the test set has reached 100% accuracy as early as 1000 times, in order to ensure that the model does not appear to be unsatisfied phenomenon, such as over-fitting, and ensure good recognition accuracy at the same time, we select 1300 times, 1350 times, 1400 times, 1450 times until 1800 times as the test object of this experiment, the number of iterations is 50 per iteration. Since the accuracy of machine learning is not certain, we measure 15 sets of data in each group and calculate the average accuracy of these 15 sets of data. As
2018102037 09 Dec 2018 shown in Fig. 11, it can be explained that the accuracy is quite high at 1350 times and 1400 times and 1600 times. So in the next few experiments, we used these three different times as test objects.
As shown in Fig. 12, 13, 14, and 15, in the fourth experiment, we prepare to change the learning rate first because the learning rate of the previous experiment was too low. As can be seen from the table above and the three line charts, when the learning rate is 0.001, the accuracy of machine learning is the highest, so we use 0.001 as the best learning rate.
As shown in Fig. 16, 17, 18, 19, in the three different iterations, when the decay rate is 0.7, module has the best accuracy.
Finally, we need to determine the optimal parameters of the dropout rate. Since the dropout is higher, the more nodes are removed, the more random our data is. Therefore, based on the initial dropout rate of 0.9, we choose 0.9999 as the best drop out rate. The parameters, in the end, also get more ideal results.
In conclusion, in the above multiple sets of comparative experiments, we finally determine the learning rate of 0.001, the decay rate of 0.7, the dropout rate of 0.9999, and the number of iterations of 1350 or 1400 times as the best parameters.
2018102037 09 Dec 2018
EDITORIAL NOTE
There is one page in the claims only .
Claims (2)
1. A method of recognition of vehicle type based on deep learning , in which using structure
Input [[CONV -+ Relu] * M -+ MaxPooling] *N [FC -+ Relu] * P -> FC -> SoftMax as the Convolutional Neural Network construction mode and Full Connected Neural Network construction mode, Μ, N and P depends on the situation.
2. A method of recognition of vehicle type based on deep learning mentioned as claim 1, in which selection of model parameters: M=4 in the convolution, filters=32, size/strides = 2x2, Drop rate = 0. 99, Base learning rate = 0. 001, Decay rate = 0. 7, iteration steps = 1600, full Connected Neural Network: the node of hidden layer is node=128, the above parameters make the training results more accurate so that good model can be obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2018102037A AU2018102037A4 (en) | 2018-12-09 | 2018-12-09 | A method of recognition of vehicle type based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2018102037A AU2018102037A4 (en) | 2018-12-09 | 2018-12-09 | A method of recognition of vehicle type based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2018102037A4 true AU2018102037A4 (en) | 2019-01-17 |
Family
ID=65009427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2018102037A Ceased AU2018102037A4 (en) | 2018-12-09 | 2018-12-09 | A method of recognition of vehicle type based on deep learning |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2018102037A4 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978847A (en) * | 2019-03-19 | 2019-07-05 | 东南大学 | Drag-line housing disease automatic identifying method based on transfer learning Yu drag-line robot |
CN110008360A (en) * | 2019-04-09 | 2019-07-12 | 河北工业大学 | Vehicle target image data base method for building up comprising specific background image |
CN110852358A (en) * | 2019-10-29 | 2020-02-28 | 中国科学院上海微系统与信息技术研究所 | Vehicle type distinguishing method based on deep learning |
CN111027445A (en) * | 2019-12-04 | 2020-04-17 | 安徽工程大学 | Target identification method for marine ship |
CN111160100A (en) * | 2019-11-29 | 2020-05-15 | 南京航空航天大学 | Lightweight depth model aerial photography vehicle detection method based on sample generation |
CN111582213A (en) * | 2020-05-15 | 2020-08-25 | 北京铁科时代科技有限公司 | Automobile identification method based on Centernet |
CN111767860A (en) * | 2020-06-30 | 2020-10-13 | 阳光学院 | Method and terminal for realizing image recognition through convolutional neural network |
CN111895931A (en) * | 2020-07-17 | 2020-11-06 | 嘉兴泊令科技有限公司 | Coal mine operation area calibration method based on computer vision |
CN112101117A (en) * | 2020-08-18 | 2020-12-18 | 长安大学 | Expressway congestion identification model construction method and device and identification method |
CN112329569A (en) * | 2020-10-27 | 2021-02-05 | 武汉理工大学 | Freight vehicle state real-time identification method based on image deep learning system |
CN112508036A (en) * | 2020-11-16 | 2021-03-16 | 杭州电子科技大学 | Handwritten digit recognition method based on convolutional neural network and codes |
CN113065653A (en) * | 2021-04-27 | 2021-07-02 | 北京工业大学 | Design method of lightweight convolutional neural network for mobile terminal image classification |
CN113128578A (en) * | 2021-04-08 | 2021-07-16 | 青岛农业大学 | Peanut excellent seed screening system and screening method thereof |
CN113205107A (en) * | 2020-11-02 | 2021-08-03 | 哈尔滨理工大学 | Vehicle type recognition method based on improved high-efficiency network |
CN113297936A (en) * | 2021-05-17 | 2021-08-24 | 北京工业大学 | Volleyball group behavior identification method based on local graph convolution network |
CN114627342A (en) * | 2022-03-03 | 2022-06-14 | 北京百度网讯科技有限公司 | Training method, device and equipment of image recognition model based on sparsity |
CN115050092A (en) * | 2022-05-20 | 2022-09-13 | 宁波明家智能科技有限公司 | Lip reading algorithm and system for intelligent driving |
CN115731436A (en) * | 2022-09-21 | 2023-03-03 | 东南大学 | Highway vehicle image retrieval method based on deep learning fusion model |
CN115953486A (en) * | 2022-12-30 | 2023-04-11 | 国网电力空间技术有限公司 | Automatic coding method for direct-current T-shaped tangent tower component inspection image |
-
2018
- 2018-12-09 AU AU2018102037A patent/AU2018102037A4/en not_active Ceased
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978847B (en) * | 2019-03-19 | 2023-07-04 | 东南大学 | Automatic inhaul cable sleeve disease identification method based on transfer learning and inhaul cable robot |
CN109978847A (en) * | 2019-03-19 | 2019-07-05 | 东南大学 | Drag-line housing disease automatic identifying method based on transfer learning Yu drag-line robot |
CN110008360A (en) * | 2019-04-09 | 2019-07-12 | 河北工业大学 | Vehicle target image data base method for building up comprising specific background image |
CN110852358A (en) * | 2019-10-29 | 2020-02-28 | 中国科学院上海微系统与信息技术研究所 | Vehicle type distinguishing method based on deep learning |
CN111160100A (en) * | 2019-11-29 | 2020-05-15 | 南京航空航天大学 | Lightweight depth model aerial photography vehicle detection method based on sample generation |
CN111027445A (en) * | 2019-12-04 | 2020-04-17 | 安徽工程大学 | Target identification method for marine ship |
CN111582213A (en) * | 2020-05-15 | 2020-08-25 | 北京铁科时代科技有限公司 | Automobile identification method based on Centernet |
CN111767860A (en) * | 2020-06-30 | 2020-10-13 | 阳光学院 | Method and terminal for realizing image recognition through convolutional neural network |
CN111895931A (en) * | 2020-07-17 | 2020-11-06 | 嘉兴泊令科技有限公司 | Coal mine operation area calibration method based on computer vision |
CN111895931B (en) * | 2020-07-17 | 2021-11-26 | 嘉兴泊令科技有限公司 | Coal mine operation area calibration method based on computer vision |
CN112101117A (en) * | 2020-08-18 | 2020-12-18 | 长安大学 | Expressway congestion identification model construction method and device and identification method |
CN112329569A (en) * | 2020-10-27 | 2021-02-05 | 武汉理工大学 | Freight vehicle state real-time identification method based on image deep learning system |
CN112329569B (en) * | 2020-10-27 | 2024-02-09 | 武汉理工大学 | Freight vehicle state real-time identification method based on image deep learning system |
CN113205107A (en) * | 2020-11-02 | 2021-08-03 | 哈尔滨理工大学 | Vehicle type recognition method based on improved high-efficiency network |
CN112508036A (en) * | 2020-11-16 | 2021-03-16 | 杭州电子科技大学 | Handwritten digit recognition method based on convolutional neural network and codes |
CN113128578B (en) * | 2021-04-08 | 2022-07-19 | 青岛农业大学 | Screening method for good peanut seeds |
CN113128578A (en) * | 2021-04-08 | 2021-07-16 | 青岛农业大学 | Peanut excellent seed screening system and screening method thereof |
CN113065653A (en) * | 2021-04-27 | 2021-07-02 | 北京工业大学 | Design method of lightweight convolutional neural network for mobile terminal image classification |
CN113065653B (en) * | 2021-04-27 | 2024-05-28 | 北京工业大学 | Design method of lightweight convolutional neural network for mobile terminal image classification |
CN113297936A (en) * | 2021-05-17 | 2021-08-24 | 北京工业大学 | Volleyball group behavior identification method based on local graph convolution network |
CN113297936B (en) * | 2021-05-17 | 2024-05-28 | 北京工业大学 | Volleyball group behavior identification method based on local graph convolution network |
CN114627342A (en) * | 2022-03-03 | 2022-06-14 | 北京百度网讯科技有限公司 | Training method, device and equipment of image recognition model based on sparsity |
CN114627342B (en) * | 2022-03-03 | 2024-09-06 | 北京百度网讯科技有限公司 | Sparsity-based image recognition model training method, device and equipment |
CN115050092A (en) * | 2022-05-20 | 2022-09-13 | 宁波明家智能科技有限公司 | Lip reading algorithm and system for intelligent driving |
CN115731436A (en) * | 2022-09-21 | 2023-03-03 | 东南大学 | Highway vehicle image retrieval method based on deep learning fusion model |
CN115731436B (en) * | 2022-09-21 | 2023-09-26 | 东南大学 | Highway vehicle image retrieval method based on deep learning fusion model |
CN115953486A (en) * | 2022-12-30 | 2023-04-11 | 国网电力空间技术有限公司 | Automatic coding method for direct-current T-shaped tangent tower component inspection image |
CN115953486B (en) * | 2022-12-30 | 2024-04-12 | 国网电力空间技术有限公司 | Automatic encoding method for inspection image of direct-current T-shaped tangent tower part |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2018102037A4 (en) | A method of recognition of vehicle type based on deep learning | |
CN110135267B (en) | Large-scene SAR image fine target detection method | |
CN108052911B (en) | Deep learning-based multi-mode remote sensing image high-level feature fusion classification method | |
CN110414377B (en) | Remote sensing image scene classification method based on scale attention network | |
CN106228185B (en) | A kind of general image classifying and identifying system neural network based and method | |
CN112270347A (en) | Medical waste classification detection method based on improved SSD | |
CN110309856A (en) | Image classification method, the training method of neural network and device | |
CN110097145A (en) | One kind being based on CNN and the pyramidal traffic contraband recognition methods of feature | |
CN107480261A (en) | One kind is based on deep learning fine granularity facial image method for quickly retrieving | |
CN108427921A (en) | A kind of face identification method based on convolutional neural networks | |
EP4163831A1 (en) | Neural network distillation method and device | |
CN108416318A (en) | Diameter radar image target depth method of model identification based on data enhancing | |
CN109344891A (en) | A kind of high-spectrum remote sensing data classification method based on deep neural network | |
CN112529146B (en) | Neural network model training method and device | |
CN112084890B (en) | Method for identifying traffic signal sign in multiple scales based on GMM and CQFL | |
CN111950583B (en) | Multi-scale traffic signal sign recognition method based on GMM (Gaussian mixture model) clustering | |
CN111524140B (en) | Medical image semantic segmentation method based on CNN and random forest method | |
CN113298032A (en) | Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning | |
CN113033321A (en) | Training method of target pedestrian attribute identification model and pedestrian attribute identification method | |
CN116468740A (en) | Image semantic segmentation model and segmentation method | |
CN114266757A (en) | Diabetic retinopathy classification method based on multi-scale fusion attention mechanism | |
CN114648667A (en) | Bird image fine-granularity identification method based on lightweight bilinear CNN model | |
CN114170519A (en) | High-resolution remote sensing road extraction method based on deep learning and multidimensional attention | |
CN114065831A (en) | Hyperspectral image classification method based on multi-scale random depth residual error network | |
CN112633169A (en) | Pedestrian recognition algorithm based on improved LeNet-5 network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |