CN109086794B - Driving behavior pattern recognition method based on T-LDA topic model - Google Patents

Driving behavior pattern recognition method based on T-LDA topic model Download PDF

Info

Publication number
CN109086794B
CN109086794B CN201810676019.6A CN201810676019A CN109086794B CN 109086794 B CN109086794 B CN 109086794B CN 201810676019 A CN201810676019 A CN 201810676019A CN 109086794 B CN109086794 B CN 109086794B
Authority
CN
China
Prior art keywords
driving
driving behavior
word
model
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810676019.6A
Other languages
Chinese (zh)
Other versions
CN109086794A (en
Inventor
石英
罗佳齐
李振威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201810676019.6A priority Critical patent/CN109086794B/en
Publication of CN109086794A publication Critical patent/CN109086794A/en
Application granted granted Critical
Publication of CN109086794B publication Critical patent/CN109086794B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a driving behavior pattern recognition method based on a T-LDA topic model, which comprises the following steps: s1, establishing a driving behavior dictionary, extracting driving behavior histogram features, and establishing the driving behavior dictionary according to the clustering result of the driving behavior data; constructing a co-occurrence matrix of driving data and driving behavior words, namely driving behavior histogram features; s2, training the improved T-LDA model by using the driving behavior histogram characteristics, further constructing the relationship among driving data, a driving mode and a driving behavior word, and introducing time information as a label of the driving behavior word; and training the model by using the driving behavior histogram features with the time labels, solving model parameters by using a Gibbs sampling method, and outputting a driving behavior recognition result. The invention can be effectively used for driving behavior pattern recognition.

Description

Driving behavior pattern recognition method based on T-LDA topic model
Technical Field
The invention relates to the technical field of driving behavior pattern recognition, in particular to a driving behavior pattern recognition method based on a T-LDA topic model.
Background
The traffic safety problem brought by the irregular behavior of a driver in the driving process is increasingly prominent nowadays. Analysis of the traffic accident occurrence cause example shows that bad driving behaviors such as rapid acceleration, rapid deceleration, rapid turning and the like of a driver are main factors for generating the traffic accident. In order to improve driving safety, timely acquire driving data, extract driving behavior characteristics from the driving data, and perform recognition and behavior improvement, the method has become a research hotspot. The rapid popularization of the intelligent mobile terminal enables vehicle driving data to be acquired more conveniently, and analysis of driving behaviors and driving modes of a driver is facilitated.
The learner acquires acceleration data of the vehicle on the horizontal axis and the vertical axis by using an acceleration sensor built in the smartphone, and recognizes driving behaviors such as acceleration, deceleration, turning and the like, thereby achieving good effects. The learner also utilizes the acceleration sensor to collect the vehicle acceleration information, divides the vehicle acceleration information into three levels of low, medium and high, establishes the relation between the acceleration level information and the driving mode categories, and finally divides the driving mode into four types: a cautious driving mode below the normal level, a normal driving mode with no threat to the driving behavior, an aggressive driving mode with a certain threat and a very aggressive driving mode with a great threat.
At present, the mainstream research is directly carried out on the underlying characteristics of the driving data, namely, the duration and the intensity of behaviors such as acceleration or turning are judged, and then the driving mode is identified. Intensive research on driving patterns has shown that identifying a driving pattern only on a single driving behavior in the driving data, without taking into account a specific combination of sequences of different driving behaviors in the driving data, may lead to poor suitability for different road conditions and for different periods of time. Researchers' research has focused on determining driving patterns based on an understanding of the sequence of driving behaviors.
The learners use the statistical model to carry out driving mode research on a series of driving behavior combinations such as acceleration, deceleration and the like in the driving process of the drivers, and the differences of the driving modes among different drivers are mined out. Researchers have therefore turned their attention to the use of well-established statistical model algorithms, the topic model algorithms, in the field of text analysis. The topic model classifies and manages the documents by extracting topic information hidden in the documents, the hidden variables are extended as topics and used as abstractions of a group of related words in the text, and a model parameterization table for generating different texts can be constructed by learning the training samples. By taking the idea of the topic model in text analysis and image scene recognition applications as a reference, the driving data can be regarded as a document, i.e. the driving data is composed of different driving patterns (topics), each driving pattern (topic) is composed of a series of single driving behaviors (words) that can represent the pattern.
The pLSA is one of the most representative topic models, and the model calculates the statistical probability distribution of each word in a document by analyzing a word-document co-occurrence matrix, thereby determining the topic of the document. But the training parameters of the method are increased linearly with the increase of the driving data set, so that the calculation is more complicated; moreover, models can only be generated for training driving data sets, and the recognition effect for new driving data is poor. For the above disadvantages, some scholars propose a Latent Dirichlet Allocation (LDA) model based on pLSA, and only use a proper amount of parameters to represent data, so as to avoid the over-fitting problem.
The invention extracts the clustering center of each different driving behavior class, takes the clustering center as a word in a driving behavior dictionary, and counts the occurrence times of the words of different driving behaviors in the driving data to obtain the word weighted histogram characteristics of the driving behaviors. Aiming at the defects of the current mainstream topic models pLSA and LDA, the invention provides an improved LDA model introducing a time label, namely a T-LDA model to identify the driving mode on the basis of the LDA model. Experimental results show that the improved model can effectively excavate the characteristics of a series of continuous driving behaviors in driving data and improve the accuracy of driving mode recognition.
Disclosure of Invention
The invention aims to solve the technical problem of providing a driving behavior pattern recognition method based on a T-LDA topic model aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a driving behavior pattern recognition method based on a T-LDA topic model, which comprises the following steps:
s1, establishing a driving behavior dictionary and extracting driving behavior histogram features: inputting driving behavior data, clustering the driving behavior data, and establishing a driving behavior dictionary according to a clustering result of the driving behavior data; extracting the clustering centers of different driving behavior classes, using the clustering centers as words in a driving behavior dictionary, and counting the occurrence times of different driving behavior words in the driving data to obtain a driving data-driving behavior word co-occurrence matrix, namely a driving behavior histogram feature;
s2, training the improved T-LDA model by using the driving behavior histogram features: the T-LDA model comprises two parts, wherein one part is the driving mode type and the probability density distribution of the driving mode type contained in each section of driving data, the other part is the driving behavior word type and the probability density distribution of the driving behavior word contained in each driving mode, so that the relationship among the driving data, the driving modes and the driving behavior words is established, and time information is introduced as a label of the driving behavior words to combine a plurality of adjacent driving behaviors; and training the model by using the driving behavior histogram features with the time labels, solving model parameters by using a Gibbs sampling method, and outputting a driving behavior recognition result.
Further, the specific method of step S1 of the present invention is:
s11, establishing a driving behavior dictionary: extracting the characteristics of the original driving behavior data, and selecting the characteristics for clustering; classifying the different driving behaviors obtained by clustering, taking various clustering centers as words in a word bag model, and forming a driving behavior dictionary by the set of all the different driving behaviors; giving weight parameters with different sizes to words with different frequencies, wherein the higher the occurrence frequency of the words is, the smaller the corresponding weight parameter is;
s12, driving behavior histogram feature extraction: according to the local features of the driving behavior data to be processed, the driving behavior histogram feature vector is mapped into words by adopting a TF-IDF method, the corresponding behavior words are searched in the constructed driving behavior dictionary, and the word occurrence frequency histogram is calculated so as to represent the driving behavior sequence.
Further, the specific process of the TF-IDF method in step S12 of the present invention is:
(1) assume that M driving behavior feature vectors, each of F ═ F, are extracted from the driving behavior data d to be processed1,f2,f3,K,fMThe generated driving behavior dictionary W ═ W }1,w2,w3,K,wVV is the size of the driving behavior dictionary;
(2) the driving behavior feature vector fiDriving behavior words w mapped into a driving behavior dictionaryciI.e. finding its position c in the dictionaryi
ci=argmin||fi-wj||2And c isi∈{1,2,L,V}
(3) For each driving behavior feature vector fiMapped driving behavior words wciUsing a Gaussian function to calculate its weight
Figure GDA0003404711080000041
Figure GDA0003404711080000042
Wherein the variance
Figure GDA0003404711080000043
Is the word wciF is the number of word frequencies centered on the word frequency;
(4) for the driving behavior word wciCalculate its weight
Figure GDA0003404711080000044
Figure GDA0003404711080000045
Where n is the total number of driving behavior words in the driving data.
Further, the specific method of step S2 of the present invention is:
s21, designing an improved LDA topic model based on the time labels: introducing time information of driving behavior words into the LDA model as observation variables, using the observation variables as labels of the driving behavior words, solving parameters of the improved T-LDA model, and finally identifying driving modes by using the T-LDA model;
and S22, solving the model parameters by using a Gibbs sampling method.
Further, the method for generating the driving behavior data of the T-LDA model in step S21 of the present invention is:
(1) for each driving mode, K driving behavior word semantics-multi-term distribution parameters of the driving mode are sampled from Dirichlet distribution with obedience parameter as beta
Figure GDA0003404711080000046
(2) For each driving mode, K driving behavior word time labels are sampled from Dirichlet distribution with obedience parameter gamma, and a polynomial distribution parameter phi of the driving mode is obtainedz
(3) For each section of driving data, a multi-term distribution parameter theta of the driving mode-driving data is sampled from Dirichlet distribution with obedience parameter alphaj
(4) The generation process for each driving behavior word in the driving data j is as follows:
(a) from thetajObtaining a driving pattern z for sampling a plurality of distributions of parametersji
(b) From phizjiSampling a driving behavior word w for a multinomial distribution of parametersji
(c) From above to below
Figure GDA0003404711080000047
Sampling a driving behavior word time t for a multi-term distribution of parametersji
Further, in step S22 of the present invention, the method for solving the model parameters using the gibbs sampling method is:
(1) extracting a word from the document set randomly or in a certain order;
(2) calculating the conditional probability p (z) of a selected word being assigned to a topic given all other words and topicsi=j|z-iW, t, α, β, γ), wherein z-i={z1,z2,L zi-1,zi+1,L zK};
(3) Randomly extracting a subject ziReplacing the subject of the current word.
(4) The above processes are continuously circulated until the alpha, beta and gamma finally converge to an invariant point.
Further, the method for solving the values of the parameters α, β, and γ in step S22 of the present invention is as follows:
for driving data j, a driving behavior word w and itsTime stamp t, divided by driving behaviour pattern zjiAll driving modes other than z-jiAnd the hyper-parameters alpha, beta and gamma, calculating the distribution of conditions
Figure GDA0003404711080000051
Wherein
Figure GDA0003404711080000052
The number of times the driving behavior word w is assigned to the driving pattern j without including the current driving pattern i,
Figure GDA0003404711080000053
the number of times the time stamp t representing the driving behaviour word w is assigned to the driving pattern j without including the current driving pattern i,
Figure GDA0003404711080000054
representing the number of times that the driving behavior word assigned to the driving pattern j in the driving data d does not include the current driving pattern i; to obtain phi,
Figure GDA0003404711080000055
And the formula for θ:
Figure GDA0003404711080000056
Figure GDA0003404711080000057
Figure GDA0003404711080000058
by theta, phi and
Figure GDA0003404711080000059
thereby obtaining the values of the parameters alpha, beta and gamma of the T-LDA model.
The invention has the following beneficial effects: the driving behavior pattern recognition method based on the T-LDA topic model provides that time information of driving behavior words is introduced into the LDA model to be used as an observation variable, the observation variable is used as a label of the driving behavior words, parameters are obtained from an improved model, and finally the driving pattern recognition is carried out by utilizing the improved model; the problem that the driving mode identification of continuous driving behavior words in certain time is inaccurate due to the fact that the structural information of sequence among the driving behavior words is ignored by the traditional algorithm is solved; and moreover, a Gibbs sampling training model is adopted, and the problem that model parameters are difficult to learn is solved.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a basic flow diagram of a driving behavior pattern recognition method;
FIG. 2 is an experimental section;
FIG. 3 is a statistical histogram of word frequency for unweighted driving behavior;
FIG. 4 is a word frequency statistical histogram of weighted driving behavior;
FIG. 5 is a graphical model of the improved model T-LDA;
FIG. 6 is a graph comparing the confusion of three models;
FIG. 7 is a result of probability distribution training based on the pLSA model driving pattern;
FIG. 8 is a result of LDA model based driving pattern probability distribution training;
FIG. 9 shows the probability distribution training results based on the T-LDA model driving pattern;
FIG. 10 is a word probability distribution diagram for driving behavior in driving mode 1;
FIG. 11 is a driving behavior word probability distribution diagram for driving mode 2;
FIG. 12 is a driving behavior word probability distribution diagram for driving mode 3;
FIG. 13 is a driving behavior word probability distribution for driving mode 4;
FIG. 14 correlation coefficient of three model reconstruction data with original data.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The basic flow of the method is as shown in figure 1, firstly, a driving behavior dictionary is built based on a word bag model, the driving behavior histogram characteristics are extracted, then, a T-LDA model introducing structural information is provided for solving the problem of structural information loss of a traditional subject model, model parameters are solved through Gibbs sampling, and the mode of the driving behavior to be tested can be identified by utilizing the learned T-LDA model. The method comprises the following specific steps.
Step S1 driving behavior dictionary establishment and driving behavior histogram feature extraction
In order to verify the performance of the algorithm, data are collected firstly, the study of the paper on the driving behavior and the driving mode mainly faces to general drivers, so that the driving data of the general drivers are collected during the experiment, and 10 drivers are selected in total.
In order to reduce the influence of the road surface on the driving data, the following two requirements are required for the experimental road section: the road surface is comparatively level and the altitude variation is little. The thesis selects a section of road in the flood mountain area in Wuhan city as a test road section for system test, the road section has small elevation change and flat road surface, and has both a male Chu road section with more straight roads and a software park section with more turning roads, which is beneficial to the development of experiments, and the test road section is shown in figure 2 (blue marked line).
The method acquires 20 sections of driving data of 10 drivers on an experimental road section, and extracts 450 driving behavior segments from the 20 sections of driving data by utilizing an endpoint detection algorithm. In the driving behavior recognition, 330 driving behavior segments are used as a training set of a clustering algorithm, and 120 driving behavior segments are used as a test set to verify the effectiveness of the algorithm. In the driving mode recognition, considering the insufficient number of driving data samples, 10-fold cross validation is adopted to verify the theme model algorithm, namely, in each theme model parameter training process, 2 sections of driving data are used as a test set, and the other sections of driving data are used as a training set to learn theme model parameters, the process runs for 10 times, each section of driving data is used as the test set once, and finally, the average value of 10 running results is taken to obtain the driving mode recognition accuracy.
Step S11 construction of driving behavior dictionary
For the collected data, the invention selects a word bag model to establish a driving behavior dictionary and obtains the driving behavior histogram characteristics, namely a driving data-single driving behavior word co-occurrence matrix in the topic model.
The bag-of-words model method only relates to the unique parameter of the number of words in the dictionary, and has the advantages of intuition and effectiveness. Firstly, all words appearing in the text are put in a dictionary, then the frequency information of each word appearing in the dictionary is expressed by a simple and effective text expression method, namely histogram expression, and finally the similarity degree of different texts is measured.
Taking text as an example, the dictionary is built by performing frequency statistics on all words appearing in the text, such as two different texts "I like playing basketball, how about you? "and" Ilove playing football.
dictionary={1:"I",2:"like",3:"playing",4:"basketball",5:"how",6:"about",7:"you",8:"love",9:"football"}
Using words in the dictionary, the two texts can represent their histogram features using vector [1,1,1,1,1,1,1, 0] and vector [1,0,1,0,0,0,0, 1,1] respectively.
For the data collected in the embodiment, 12 driving behaviors obtained by clustering the driving behaviors are classified, and the clustering centers of the 12 classes are extracted to establish a dictionary containing all basic driving behavior words.
Step S12 driving behavior histogram feature extraction
When the method of applying the conventional bag-of-words model represents driving data, each driving behavior word is considered to be uniformly distributed, so that the driving pattern distribution tends to be a high-frequency behavior word. In order to balance the influence of the words of the driving behaviors with different frequencies, the invention assigns weight parameters with different sizes to the words with different frequencies, and the higher the occurrence frequency of the words is, the smaller the corresponding weight parameter is.
TF-IDF (Term Frequency-Inverse Document Frequency) is a widely applied statistical method for judging and evaluating the importance of words to documents. The invention adopts TF-IDF method to map the driving behavior histogram feature vector into words, then calculates the weight of the words, and carries out weighting treatment, the construction steps are as follows:
(1) assume that M driving behavior feature vectors, each of F ═ F, are extracted from the driving data d to be processed1,f2,f3,K,fMThe generated driving behavior dictionary W ═ W }1,w2,w3,K,wVV is the size of the driving behavior dictionary.
(2) The driving behavior feature vector fiDriving behavior words w mapped into a driving behavior dictionaryciI.e. finding its position c in the dictionaryi
ci=argmin||fi-wj||2And c isi∈{1,2,L,V} (4-1)
(3) For each driving behavior feature vector fiMapped driving behavior words wciUsing a Gaussian function to calculate its weight
Figure GDA0003404711080000081
Figure GDA0003404711080000082
Wherein the variance
Figure GDA0003404711080000083
Is the word wciF is the word frequency number centered on the word frequency.
(4) For the driving behavior word wciCalculate its weight
Figure GDA0003404711080000091
Figure GDA0003404711080000092
Where n is the total number of driving behavior words in the driving data.
After the weighted driving behavior word histogram is constructed by the TF-IDF method, each section of driving data can be represented by a group of words with weights.
And then, acquiring the histogram characteristics of the driving dictionary corresponding to each section of driving data. Randomly pick a driving data in the training driving data set, whose unweighted driving behavior word frequency histogram is shown in fig. 3.
After the driving behavior words in the same driving data are subjected to TF-IDF processing, a driving behavior word weighting frequency histogram is obtained, as shown in FIG. 4.
As can be seen from fig. 4, after TF-IDF weighting, the driving behavior word weighting frequency histogram becomes flat, the weight of the high frequency words is reduced, and the weight of the medium frequency words is increased, so that the distribution is equalized.
Step S2 training the improved T-LDA model by using driving behavior histogram feature
And aiming at the histogram features, modeling is carried out by using improved T-LDA, and model parameters are solved by Gibbs sampling, wherein the specific steps are as follows.
Step S21 time-stamp-based improved LDA topic model design
Since the original pLSA and LDA models are based on bag-of-words model assumptions, only the semantics of the words focusing on driving behavior ignore structural information when applying and driving pattern recognition. Therefore, the current research mainly focuses on compensating for the loss of structural information of the models when the two models are applied. The improvement of the topic model mainly focuses on two aspects: and improving the internal structure of the model and improving the hyper-parameters. The former is mainly to add observation variables or hidden variables in the model, and the latter is mainly to re-parameterize the hyper-parameters to carry out dynamic modeling.
From the current research, improved models become more and more complex, mainly embodied in more and more levels, and more hidden variables and hyper-parameters. Compared with other models which are continuously increased in the hierarchical structure, the invention only introduces Time information as an observation variable in the LDA model and proposes an improved model T-LDA (Time-LDA). When a driving behavior word is sampled from a driving mode theme, a time label corresponding to the driving behavior word is sampled, and a graph model of the improved model T-LDA is shown in fig. 5.
According to the graph model of the T-LDA, a piece of driving data is generated as follows:
(1) for each driving mode, K driving behavior word semantics-multi-term distribution parameters of the driving mode are sampled from Dirichlet distribution with obedience parameter as beta
Figure GDA0003404711080000101
(2) For each driving mode, K driving behavior word time labels are sampled from Dirichlet distribution with obedience parameter gamma, and a polynomial distribution parameter phi of the driving mode is obtainedz
(3) For each section of driving data, a multi-term distribution parameter theta of the driving mode-driving data is sampled from Dirichlet distribution with obedience parameter alphaj
(4) The generation process for each driving behavior word in the driving data j is as follows:
(a) from thetajObtaining a driving pattern z for sampling a plurality of distributions of parametersji
(b) From above to below
Figure GDA0003404711080000102
Sampling a driving behavior word w for a multinomial distribution of parametersji
(c) From above to below
Figure GDA0003404711080000103
Sampling a driving behavior word time t for a multi-term distribution of parametersji
Compared with other models, the words in the T-LDA can be regarded as a vocabulary entry consisting of two words, namely the semantics of the driving behavior words and the time of the driving behavior words, so that the T-LDA can make up for the loss of the time information of the driving behavior words caused by the hypothesis of the bag-of-words model.
Step S22 solving model parameters using Gibbs sampling method
The method adopts a Gibbs sampling algorithm to approximately deduce parameters of a T-LDA model, and for driving data j, a driving behavior word w and a time label T thereof are given, except a driving behavior pattern zjiAll driving modes other than z-jiAnd the hyper-parameters alpha, beta and gamma, calculating the distribution of conditions
Figure GDA0003404711080000104
Figure GDA0003404711080000105
Wherein
Figure GDA0003404711080000106
The number of times the driving behavior word w is assigned to the driving pattern j without including the current driving pattern i,
Figure GDA0003404711080000107
the number of times the time stamp t representing the driving behaviour word w is assigned to the driving pattern j without including the current driving pattern i,
Figure GDA0003404711080000108
representing the number of times the driving behavior word assigned to the driving pattern j in the driving data d does not contain the current driving pattern i. Finally, phi can be obtained,
Figure GDA0003404711080000109
And the formula for θ:
Figure GDA0003404711080000111
Figure GDA0003404711080000112
Figure GDA0003404711080000113
by theta, phi and
Figure GDA0003404711080000114
thereby obtaining the values of LDA model parameters alpha, beta and gamma.
The sampling process of the T-LDA model Gibbs sampling algorithm is as follows:
1) extracting a word from the document set randomly or in a certain order;
2) calculating the conditional probability p (z) of a selected word being assigned to a topic given all other words and topicsi=j|z-iW, t, α, β, γ), wherein z-i={z1,z2,L zi-1,zi+1,L zK};
3) Randomly extracting a subject ziReplacing the subject of the current word.
4) The above processes are continuously circulated until the alpha, beta and gamma finally converge to an invariant point.
The improved topic model is used for driving pattern recognition and is similar to an LDA model, and firstly, the driving behavior word histogram feature of each training driving data is obtained by using a bag-of-words model; and finally, solving model parameters alpha, beta and gamma of the training driving data by utilizing a Gibbs sampling algorithm to obtain the driving mode distribution in each driving data.
For new driving data dtestCalculation of p (z)k|dtest). At this time, the probability distribution of the driving behavior words and their time labels in all driving patterns, i.e., the first two parts P (w) in equation (4-28), has been obtained through the training setdi|z-i,zi=j,α,β),P(tdi|z-i,wdi,ziJ, α, β, γ) are known, respectively
Figure GDA0003404711080000115
And
Figure GDA0003404711080000116
only the last part P (z) of solution (4-29) is requiredi=j,z-iα, β, γ). The sampling formula obtained through derivation calculation is as follows:
Figure GDA0003404711080000117
the calculation process of equation (4-33) is performed by Gibbs sampling as in training. Test driving data dtestThe driving pattern type k included in (1) is determined by the following equation:
k=argmaxkp(zk|dtest) (4-34)
the collected data are processed according to the above principle, and the three topic models of the pLSA, the LDA and the T-LDA are evaluated from the two aspects of theory and practicality. The theoretical aspect is mainly that the similarity of actual data is evaluated through two parts, namely the confusion degree and the model; the practical aspect is embodied by the identification accuracy of the test model to the new driving data. Therefore, for each topic model parameter training process, 2 sections of driving data are used as a test set, other sections of driving data are used as training sets to learn topic model parameters, the process runs for 10 times in total, each section of driving data can be used as the test set for one time, and finally the average value of 10 running results is taken to obtain the driving pattern recognition accuracy rate.
(1) Selecting the number of best driving modes by confusion
When the theme model is applied to a driving mode, a reasonable theme number is firstly appointed to train the model, so that an index is needed to measure the modeling capacity of the theme model when the theme model has different theme numbers. Bleei et al, in their studies, proposed the use of Perplexity (Perplexity) to evaluate the quality of the subject model and achieve better results, and the present invention also selects Perplexity to determine the best number of subjects for the subject model.
For a set of M driving data D, NdIs the driving behavior word w in the d-th driving datadNumber of (c), p (w)d) Probability of representing driving data, then perplexity (D) is
Figure GDA0003404711080000121
Generally, the smaller the confusion, the smaller the difference between the extracted theme representing the subject model and the actual theme, i.e. the better the modeling effect of the theme model.
The pLSA, LDA and improved LDA models were trained using the word histogram characteristics of driving behavior, and 2, 3, 4, 5, 6 driving patterns were extracted from 18 training driving data, respectively, and fig. 6 shows the average confusion comparison results after 10 training of the three models.
As can be seen from fig. 6, for the driving data set collected by the present invention, the confusion degree takes the minimum value when the driving mode is designated as 4 for all three subject models. Therefore, in the subsequent correlation analysis, the results obtained when the number of subjects is 4 are all specified. Among the three topic models, the LDA model modeling is better than the pLSA model, and the TLDA model modeling effect is better than the LDA model.
(2) Correlation analysis of reconstructed data and original data of main body model
The above three models will be described next using the distribution probability of the driving pattern in the driving data and the distribution probability of the driving behavior word in the driving pattern. And finally, reconstructing data by using the main body model, and performing correlation analysis on the data and the original data so as to measure the capability of different models for extracting the driving mode. The distributed probabilities of the driving patterns in the first 5 training driving data in the pLSA, LDA and T-LDA models are shown in fig. 7, 8 and 9.
As can be seen from fig. 7, 8 and 9, the pLSA, LDA and T-LDA models are substantially uniform in distribution for the 4 driving patterns in the first 5 driving data.
The distributed probabilities of driving behavior words in 4 driving patterns in the pLSA, LDA and T-LDA models are shown in fig. 10, 11, 12 and 13.
It can be defined as a cautious driving pattern according to the driving behavior word probability distribution of the driving pattern 1 in fig. 10.
According to the driving behavior word probability distribution of the driving pattern 2 in fig. 11, it can be defined as a general type driving pattern.
It can be defined as an aggressive driving pattern according to the driving behavior word probability distribution of the driving pattern 3 in fig. 12.
From the distribution of the driving behavior words in the driving pattern 4 in fig. 13, it can be defined as a very aggressive driving pattern. The distributed probabilities of driving behavior words in 4 driving patterns in the pLSA and LDA and T-LDA models are also substantially consistent, as shown in fig. 10, 11, 12 and 13. The driving data is reconstructed by using the training results of the three models, that is, the distribution of the driving patterns in the driving data and the distribution of the driving behavior words in the driving patterns, and the correlation analysis is performed on the reconstructed driving data and the acquired original driving data, so that the correlation coefficient between the reconstructed data and the original driving data of each driving data can be obtained, as shown in fig. 14.
The correlation coefficient can represent the consistency of the reconstructed data and the original data, and the effect of the T-LDA model is superior to that of the LDA and the pLSA models, which shows that the T-LDA model can better describe the driving mode implicit in the driving data compared with the LDA and the pLSA models.
(3) Rate of driving pattern recognition
The accuracy of recognition in the test driving data for the pLSA, LDA and T-LDA models for the 4 driving modes is shown in table 1 below. The recognition rate for each driving pattern is the average of 10 cross-validation training results.
TABLE 1 pLSA, LDA and T-LDA model identification rates
Figure GDA0003404711080000141
As can be seen from Table 1, the recognition rate of the T-LDA model provided by the invention on 4 driving modes is superior to that of the pLSA and the LDA models, which shows that the driving modes extracted by the T-LDA model compared with the LDA and the pLSA models are closer to the actual driving modes implied in the driving data.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (7)

1. A driving behavior pattern recognition method based on a T-LDA topic model is characterized by comprising the following steps:
s1, establishing a driving behavior dictionary and extracting driving behavior histogram features: inputting driving behavior data, clustering the driving behavior data, and establishing a driving behavior dictionary according to a clustering result of the driving behavior data; extracting the clustering centers of different driving behavior classes, using the clustering centers as words in a driving behavior dictionary, and counting the occurrence times of different driving behavior words in the driving data to obtain a driving data-driving behavior word co-occurrence matrix, namely a driving behavior histogram feature;
s2, training the improved T-LDA model by using the driving behavior histogram features: the T-LDA model comprises two parts, wherein one part is the driving mode type and the probability density distribution of the driving mode type contained in each section of driving data, the other part is the driving behavior word type and the probability density distribution of the driving behavior word contained in each driving mode, so that the relationship among the driving data, the driving modes and the driving behavior words is established, and time information is introduced as a label of the driving behavior words to combine a plurality of adjacent driving behaviors; and training the model by using the driving behavior histogram features with the time labels, solving model parameters by using a Gibbs sampling method, and outputting a driving behavior recognition result.
2. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 1, wherein the specific method of step S1 is:
s11, establishing a driving behavior dictionary: extracting the characteristics of the original driving behavior data, and selecting the characteristics for clustering; classifying the different driving behaviors obtained by clustering, taking various clustering centers as words in a word bag model, and forming a driving behavior dictionary by the set of all the different driving behaviors; giving weight parameters with different sizes to words with different frequencies, wherein the higher the occurrence frequency of the words is, the smaller the corresponding weight parameter is;
s12, driving behavior histogram feature extraction: according to the local features of the driving behavior data to be processed, the driving behavior histogram feature vector is mapped into words by adopting a TF-IDF method, the corresponding behavior words are searched in the constructed driving behavior dictionary, and the word occurrence frequency histogram is calculated so as to represent the driving behavior sequence.
3. The driving behavior pattern recognition method based on the T-LDA topic model of claim 2, wherein the TF-IDF method in step S12 comprises the following steps:
(1) assume that M driving behavior feature vectors, each of F ═ F, are extracted from the driving behavior data d to be processed1,f2,f3,… ,fMThe generated driving behavior dictionary W ═ W }1,w2,w3,… ,wVV is the size of the driving behavior dictionary;
(2) the driving behavior feature vector fiDriving behavior words w mapped into a driving behavior dictionaryciI.e. finding its position c in the dictionaryi
ci=argmin||fi-wj||2And c isi∈{1,2,… ,V}
(3) For each driving behavior feature vector fiMapped driving behavior words wciUsing a Gaussian function to calculate its weight
Figure FDA0003404711070000021
Figure FDA0003404711070000022
Wherein the variance
Figure FDA0003404711070000023
Is the word wciF is the number of word frequencies centered on the word frequency;
(4) for the driving behavior word wciCalculate its weight
Figure FDA0003404711070000024
Figure FDA0003404711070000025
Where n is the total number of driving behavior words in the driving data.
4. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 1, wherein the specific method of step S2 is:
s21, designing an improved LDA topic model based on the time labels: introducing time information of driving behavior words into the LDA model as observation variables, using the observation variables as labels of the driving behavior words, solving parameters of the improved T-LDA model, and finally identifying driving modes by using the T-LDA model;
and S22, solving the model parameters by using a Gibbs sampling method.
5. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 4, wherein the driving behavior data of the T-LDA model generated in step S21 is generated by:
(1) for each driving mode, K driving behavior word semantics-multi-term distribution parameters of the driving mode are sampled from Dirichlet distribution with obedience parameter as beta
Figure FDA0003404711070000031
(2) For theEach driving mode is sampled from Dirichlet distribution with obedience parameter as gamma to obtain K driving behavior word time labels-multiple distribution parameters phi of the driving modez
(3) For each section of driving data, a multi-term distribution parameter theta of the driving mode-driving data is sampled from Dirichlet distribution with obedience parameter alphaj
(4) The generation process for each driving behavior word in the driving data j is as follows:
(a) from thetajObtaining a driving pattern z for sampling a plurality of distributions of parametersji
(b) From above to below
Figure FDA0003404711070000032
Sampling a driving behavior word w for a multinomial distribution of parametersji
(c) From above to below
Figure FDA0003404711070000033
Sampling a driving behavior word time t for a multi-term distribution of parametersji
6. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 5, wherein the method for solving the model parameters by using the Gibbs sampling method in step S22 is as follows:
(1) extracting a word from the document set randomly or in a certain order;
(2) calculating the conditional probability p (z) of a selected word being assigned to a topic given all other words and topicsi=j|z-iW, t, α, β, γ), wherein z-i={z1,z2,…, zi-1,zi+1,…, zK};
(3) Randomly extracting a subject ziReplacing the subject of the current word;
(4) the above processes are continuously circulated until the alpha, beta and gamma finally converge to an invariant point.
7. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 6, wherein the method for solving the values of the parameters α, β and γ in step S22 is as follows:
for driving data j, given a driving behavior word w and its time tag t, divided by a driving behavior pattern zjiAll driving modes other than z-jiAnd the hyper-parameters alpha, beta and gamma, calculating the distribution of conditions
Figure FDA0003404711070000034
Wherein
Figure FDA0003404711070000035
The number of times the driving behavior word w is assigned to the driving pattern j without including the current driving pattern i,
Figure FDA0003404711070000036
the number of times the time stamp t representing the driving behaviour word w is assigned to the driving pattern j without including the current driving pattern i,
Figure FDA0003404711070000037
representing the number of times that the driving behavior word assigned to the driving pattern j in the driving data d does not include the current driving pattern i; to obtain phi,
Figure FDA0003404711070000041
And the formula for θ:
Figure FDA0003404711070000042
Figure FDA0003404711070000043
Figure FDA0003404711070000044
by theta, phi and
Figure FDA0003404711070000045
thereby obtaining the values of the parameters alpha, beta and gamma of the T-LDA model.
CN201810676019.6A 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model Expired - Fee Related CN109086794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810676019.6A CN109086794B (en) 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810676019.6A CN109086794B (en) 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model

Publications (2)

Publication Number Publication Date
CN109086794A CN109086794A (en) 2018-12-25
CN109086794B true CN109086794B (en) 2022-03-01

Family

ID=64839853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810676019.6A Expired - Fee Related CN109086794B (en) 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model

Country Status (1)

Country Link
CN (1) CN109086794B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378397B (en) * 2019-06-27 2021-08-24 深圳大学 Driving style recognition method and device
CN111126438B (en) * 2019-11-22 2023-11-14 北京理工大学 Driving behavior recognition method and system
CN113159105B (en) * 2021-02-26 2023-08-08 北京科技大学 Driving behavior unsupervised mode identification method and data acquisition monitoring system
CN113239964B (en) * 2021-04-13 2024-03-01 联合汽车电子有限公司 Method, device, equipment and storage medium for processing vehicle data
CN114780670B (en) * 2022-03-07 2024-07-12 合肥工业大学 Method and system for mining travel mode of driver based on travel purpose

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102014205127A1 (en) * 2013-04-17 2014-10-23 Ford Global Technologies, Llc Control of the driving dynamics of a vehicle with ruts compensation
CN105894815A (en) * 2016-05-27 2016-08-24 苏州市职业大学 Semantic region segmentation-based traffic congestion early warning method
CN106408032A (en) * 2016-09-30 2017-02-15 防城港市港口区高创信息技术有限公司 Fatigue driving detection method based on corner of steering wheel

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100209885A1 (en) * 2009-02-18 2010-08-19 Gm Global Technology Operations, Inc. Vehicle stability enhancement control adaptation to driving skill based on lane change maneuver

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102014205127A1 (en) * 2013-04-17 2014-10-23 Ford Global Technologies, Llc Control of the driving dynamics of a vehicle with ruts compensation
CN105894815A (en) * 2016-05-27 2016-08-24 苏州市职业大学 Semantic region segmentation-based traffic congestion early warning method
CN106408032A (en) * 2016-09-30 2017-02-15 防城港市港口区高创信息技术有限公司 Fatigue driving detection method based on corner of steering wheel

Also Published As

Publication number Publication date
CN109086794A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109086794B (en) Driving behavior pattern recognition method based on T-LDA topic model
Dong et al. Towards interpretable deep neural networks by leveraging adversarial examples
CN109614979B (en) Data augmentation method and image classification method based on selection and generation
CN112308158A (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN109919106B (en) Progressive target fine recognition and description method
CN102156871B (en) Image classification method based on category correlated codebook and classifier voting strategy
CN109919252B (en) Method for generating classifier by using few labeled images
CN108776774A (en) A kind of human facial expression recognition method based on complexity categorization of perception algorithm
CN112966691A (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN106897669A (en) A kind of pedestrian based on consistent iteration various visual angles transfer learning discrimination method again
CN112949408B (en) Real-time identification method and system for target fish passing through fish channel
CN105609116A (en) Speech emotional dimensions region automatic recognition method
Jolly et al. How do convolutional neural networks learn design?
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
CN101216886B (en) A shot clustering method based on spectral segmentation theory
CN115270752A (en) Template sentence evaluation method based on multilevel comparison learning
CN103336830B (en) Image search method based on structure semantic histogram
CN113298184B (en) Sample extraction and expansion method and storage medium for small sample image recognition
CN117313709B (en) Method for detecting generated text based on statistical information and pre-training language model
CN104537392B (en) A kind of method for checking object based on the semantic part study of identification
CN112861881A (en) Honeycomb lung recognition method based on improved MobileNet model
CN106548195A (en) A kind of object detection method based on modified model HOG ULBP feature operators
CN108229565A (en) A kind of image understanding method based on cognition
CN114925198B (en) Knowledge-driven text classification method integrating character information
CN113792574B (en) Cross-dataset expression recognition method based on metric learning and teacher student model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220301

CF01 Termination of patent right due to non-payment of annual fee