CN109086794B

CN109086794B - Driving behavior pattern recognition method based on T-LDA topic model

Info

Publication number: CN109086794B
Application number: CN201810676019.6A
Authority: CN
Inventors: 石英; 罗佳齐; 李振威
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2022-03-01
Anticipated expiration: 2038-06-27
Also published as: CN109086794A

Abstract

The invention discloses a driving behavior pattern recognition method based on a T-LDA topic model, which comprises the following steps: s1, establishing a driving behavior dictionary, extracting driving behavior histogram features, and establishing the driving behavior dictionary according to the clustering result of the driving behavior data; constructing a co-occurrence matrix of driving data and driving behavior words, namely driving behavior histogram features; s2, training the improved T-LDA model by using the driving behavior histogram characteristics, further constructing the relationship among driving data, a driving mode and a driving behavior word, and introducing time information as a label of the driving behavior word; and training the model by using the driving behavior histogram features with the time labels, solving model parameters by using a Gibbs sampling method, and outputting a driving behavior recognition result. The invention can be effectively used for driving behavior pattern recognition.

Description

Driving behavior pattern recognition method based on T-LDA topic model

Technical Field

The invention relates to the technical field of driving behavior pattern recognition, in particular to a driving behavior pattern recognition method based on a T-LDA topic model.

Background

The traffic safety problem brought by the irregular behavior of a driver in the driving process is increasingly prominent nowadays. Analysis of the traffic accident occurrence cause example shows that bad driving behaviors such as rapid acceleration, rapid deceleration, rapid turning and the like of a driver are main factors for generating the traffic accident. In order to improve driving safety, timely acquire driving data, extract driving behavior characteristics from the driving data, and perform recognition and behavior improvement, the method has become a research hotspot. The rapid popularization of the intelligent mobile terminal enables vehicle driving data to be acquired more conveniently, and analysis of driving behaviors and driving modes of a driver is facilitated.

The learner acquires acceleration data of the vehicle on the horizontal axis and the vertical axis by using an acceleration sensor built in the smartphone, and recognizes driving behaviors such as acceleration, deceleration, turning and the like, thereby achieving good effects. The learner also utilizes the acceleration sensor to collect the vehicle acceleration information, divides the vehicle acceleration information into three levels of low, medium and high, establishes the relation between the acceleration level information and the driving mode categories, and finally divides the driving mode into four types: a cautious driving mode below the normal level, a normal driving mode with no threat to the driving behavior, an aggressive driving mode with a certain threat and a very aggressive driving mode with a great threat.

At present, the mainstream research is directly carried out on the underlying characteristics of the driving data, namely, the duration and the intensity of behaviors such as acceleration or turning are judged, and then the driving mode is identified. Intensive research on driving patterns has shown that identifying a driving pattern only on a single driving behavior in the driving data, without taking into account a specific combination of sequences of different driving behaviors in the driving data, may lead to poor suitability for different road conditions and for different periods of time. Researchers' research has focused on determining driving patterns based on an understanding of the sequence of driving behaviors.

The learners use the statistical model to carry out driving mode research on a series of driving behavior combinations such as acceleration, deceleration and the like in the driving process of the drivers, and the differences of the driving modes among different drivers are mined out. Researchers have therefore turned their attention to the use of well-established statistical model algorithms, the topic model algorithms, in the field of text analysis. The topic model classifies and manages the documents by extracting topic information hidden in the documents, the hidden variables are extended as topics and used as abstractions of a group of related words in the text, and a model parameterization table for generating different texts can be constructed by learning the training samples. By taking the idea of the topic model in text analysis and image scene recognition applications as a reference, the driving data can be regarded as a document, i.e. the driving data is composed of different driving patterns (topics), each driving pattern (topic) is composed of a series of single driving behaviors (words) that can represent the pattern.

The pLSA is one of the most representative topic models, and the model calculates the statistical probability distribution of each word in a document by analyzing a word-document co-occurrence matrix, thereby determining the topic of the document. But the training parameters of the method are increased linearly with the increase of the driving data set, so that the calculation is more complicated; moreover, models can only be generated for training driving data sets, and the recognition effect for new driving data is poor. For the above disadvantages, some scholars propose a Latent Dirichlet Allocation (LDA) model based on pLSA, and only use a proper amount of parameters to represent data, so as to avoid the over-fitting problem.

The invention extracts the clustering center of each different driving behavior class, takes the clustering center as a word in a driving behavior dictionary, and counts the occurrence times of the words of different driving behaviors in the driving data to obtain the word weighted histogram characteristics of the driving behaviors. Aiming at the defects of the current mainstream topic models pLSA and LDA, the invention provides an improved LDA model introducing a time label, namely a T-LDA model to identify the driving mode on the basis of the LDA model. Experimental results show that the improved model can effectively excavate the characteristics of a series of continuous driving behaviors in driving data and improve the accuracy of driving mode recognition.

Disclosure of Invention

The invention aims to solve the technical problem of providing a driving behavior pattern recognition method based on a T-LDA topic model aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

the invention provides a driving behavior pattern recognition method based on a T-LDA topic model, which comprises the following steps:

s1, establishing a driving behavior dictionary and extracting driving behavior histogram features: inputting driving behavior data, clustering the driving behavior data, and establishing a driving behavior dictionary according to a clustering result of the driving behavior data; extracting the clustering centers of different driving behavior classes, using the clustering centers as words in a driving behavior dictionary, and counting the occurrence times of different driving behavior words in the driving data to obtain a driving data-driving behavior word co-occurrence matrix, namely a driving behavior histogram feature;

s2, training the improved T-LDA model by using the driving behavior histogram features: the T-LDA model comprises two parts, wherein one part is the driving mode type and the probability density distribution of the driving mode type contained in each section of driving data, the other part is the driving behavior word type and the probability density distribution of the driving behavior word contained in each driving mode, so that the relationship among the driving data, the driving modes and the driving behavior words is established, and time information is introduced as a label of the driving behavior words to combine a plurality of adjacent driving behaviors; and training the model by using the driving behavior histogram features with the time labels, solving model parameters by using a Gibbs sampling method, and outputting a driving behavior recognition result.

Further, the specific method of step S1 of the present invention is:

s11, establishing a driving behavior dictionary: extracting the characteristics of the original driving behavior data, and selecting the characteristics for clustering; classifying the different driving behaviors obtained by clustering, taking various clustering centers as words in a word bag model, and forming a driving behavior dictionary by the set of all the different driving behaviors; giving weight parameters with different sizes to words with different frequencies, wherein the higher the occurrence frequency of the words is, the smaller the corresponding weight parameter is;

s12, driving behavior histogram feature extraction: according to the local features of the driving behavior data to be processed, the driving behavior histogram feature vector is mapped into words by adopting a TF-IDF method, the corresponding behavior words are searched in the constructed driving behavior dictionary, and the word occurrence frequency histogram is calculated so as to represent the driving behavior sequence.

Further, the specific process of the TF-IDF method in step S12 of the present invention is:

(1) assume that M driving behavior feature vectors, each of F ═ F, are extracted from the driving behavior data d to be processed₁,f₂,f₃,K,f_MThe generated driving behavior dictionary W ═ W }₁,w₂,w₃,K,w_VV is the size of the driving behavior dictionary;

(2) the driving behavior feature vector f_iDriving behavior words w mapped into a driving behavior dictionary_ciI.e. finding its position c in the dictionary_i：

c_i＝argmin||f_i-w_j||²And c is_i∈{1,2,L,V}

(3) For each driving behavior feature vector f_iMapped driving behavior words w_ciUsing a Gaussian function to calculate its weight

Wherein the variance

Is the word w_ciF is the number of word frequencies centered on the word frequency;

(4) for the driving behavior word w_ciCalculate its weight

Where n is the total number of driving behavior words in the driving data.

Further, the specific method of step S2 of the present invention is:

s21, designing an improved LDA topic model based on the time labels: introducing time information of driving behavior words into the LDA model as observation variables, using the observation variables as labels of the driving behavior words, solving parameters of the improved T-LDA model, and finally identifying driving modes by using the T-LDA model;

and S22, solving the model parameters by using a Gibbs sampling method.

Further, the method for generating the driving behavior data of the T-LDA model in step S21 of the present invention is:

(1) for each driving mode, K driving behavior word semantics-multi-term distribution parameters of the driving mode are sampled from Dirichlet distribution with obedience parameter as beta

(2) For each driving mode, K driving behavior word time labels are sampled from Dirichlet distribution with obedience parameter gamma, and a polynomial distribution parameter phi of the driving mode is obtained_z；

(3) For each section of driving data, a multi-term distribution parameter theta of the driving mode-driving data is sampled from Dirichlet distribution with obedience parameter alpha_j；

(4) The generation process for each driving behavior word in the driving data j is as follows:

(a) from theta_jObtaining a driving pattern z for sampling a plurality of distributions of parameters_ji；

(b) From phi_zjiSampling a driving behavior word w for a multinomial distribution of parameters_ji；

(c) From above to below

Sampling a driving behavior word time t for a multi-term distribution of parameters_ji。

Further, in step S22 of the present invention, the method for solving the model parameters using the gibbs sampling method is:

(1) extracting a word from the document set randomly or in a certain order;

(2) calculating the conditional probability p (z) of a selected word being assigned to a topic given all other words and topics_i＝j|z_-iW, t, α, β, γ), wherein z_-i＝{z₁,z₂,L z_i-1,z_i+1,L z_K}；

(3) Randomly extracting a subject z_iReplacing the subject of the current word.

(4) The above processes are continuously circulated until the alpha, beta and gamma finally converge to an invariant point.

Further, the method for solving the values of the parameters α, β, and γ in step S22 of the present invention is as follows:

for driving data j, a driving behavior word w and itsTime stamp t, divided by driving behaviour pattern z_jiAll driving modes other than z-_jiAnd the hyper-parameters alpha, beta and gamma, calculating the distribution of conditions

Wherein

The number of times the driving behavior word w is assigned to the driving pattern j without including the current driving pattern i,

the number of times the time stamp t representing the driving behaviour word w is assigned to the driving pattern j without including the current driving pattern i,

representing the number of times that the driving behavior word assigned to the driving pattern j in the driving data d does not include the current driving pattern i; to obtain phi,

And the formula for θ:

by theta, phi and

thereby obtaining the values of the parameters alpha, beta and gamma of the T-LDA model.

The invention has the following beneficial effects: the driving behavior pattern recognition method based on the T-LDA topic model provides that time information of driving behavior words is introduced into the LDA model to be used as an observation variable, the observation variable is used as a label of the driving behavior words, parameters are obtained from an improved model, and finally the driving pattern recognition is carried out by utilizing the improved model; the problem that the driving mode identification of continuous driving behavior words in certain time is inaccurate due to the fact that the structural information of sequence among the driving behavior words is ignored by the traditional algorithm is solved; and moreover, a Gibbs sampling training model is adopted, and the problem that model parameters are difficult to learn is solved.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a basic flow diagram of a driving behavior pattern recognition method;

FIG. 2 is an experimental section;

FIG. 3 is a statistical histogram of word frequency for unweighted driving behavior;

FIG. 4 is a word frequency statistical histogram of weighted driving behavior;

FIG. 5 is a graphical model of the improved model T-LDA;

FIG. 6 is a graph comparing the confusion of three models;

FIG. 7 is a result of probability distribution training based on the pLSA model driving pattern;

FIG. 8 is a result of LDA model based driving pattern probability distribution training;

FIG. 9 shows the probability distribution training results based on the T-LDA model driving pattern;

FIG. 10 is a word probability distribution diagram for driving behavior in driving mode 1;

FIG. 11 is a driving behavior word probability distribution diagram for driving mode 2;

FIG. 12 is a driving behavior word probability distribution diagram for driving mode 3;

FIG. 13 is a driving behavior word probability distribution for driving mode 4;

FIG. 14 correlation coefficient of three model reconstruction data with original data.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The basic flow of the method is as shown in figure 1, firstly, a driving behavior dictionary is built based on a word bag model, the driving behavior histogram characteristics are extracted, then, a T-LDA model introducing structural information is provided for solving the problem of structural information loss of a traditional subject model, model parameters are solved through Gibbs sampling, and the mode of the driving behavior to be tested can be identified by utilizing the learned T-LDA model. The method comprises the following specific steps.

Step S1 driving behavior dictionary establishment and driving behavior histogram feature extraction

In order to verify the performance of the algorithm, data are collected firstly, the study of the paper on the driving behavior and the driving mode mainly faces to general drivers, so that the driving data of the general drivers are collected during the experiment, and 10 drivers are selected in total.

In order to reduce the influence of the road surface on the driving data, the following two requirements are required for the experimental road section: the road surface is comparatively level and the altitude variation is little. The thesis selects a section of road in the flood mountain area in Wuhan city as a test road section for system test, the road section has small elevation change and flat road surface, and has both a male Chu road section with more straight roads and a software park section with more turning roads, which is beneficial to the development of experiments, and the test road section is shown in figure 2 (blue marked line).

The method acquires 20 sections of driving data of 10 drivers on an experimental road section, and extracts 450 driving behavior segments from the 20 sections of driving data by utilizing an endpoint detection algorithm. In the driving behavior recognition, 330 driving behavior segments are used as a training set of a clustering algorithm, and 120 driving behavior segments are used as a test set to verify the effectiveness of the algorithm. In the driving mode recognition, considering the insufficient number of driving data samples, 10-fold cross validation is adopted to verify the theme model algorithm, namely, in each theme model parameter training process, 2 sections of driving data are used as a test set, and the other sections of driving data are used as a training set to learn theme model parameters, the process runs for 10 times, each section of driving data is used as the test set once, and finally, the average value of 10 running results is taken to obtain the driving mode recognition accuracy.

Step S11 construction of driving behavior dictionary

For the collected data, the invention selects a word bag model to establish a driving behavior dictionary and obtains the driving behavior histogram characteristics, namely a driving data-single driving behavior word co-occurrence matrix in the topic model.

The bag-of-words model method only relates to the unique parameter of the number of words in the dictionary, and has the advantages of intuition and effectiveness. Firstly, all words appearing in the text are put in a dictionary, then the frequency information of each word appearing in the dictionary is expressed by a simple and effective text expression method, namely histogram expression, and finally the similarity degree of different texts is measured.

Taking text as an example, the dictionary is built by performing frequency statistics on all words appearing in the text, such as two different texts "I like playing basketball, how about you? "and" Ilove playing football.

dictionary＝{1:"I",2:"like",3:"playing",4:"basketball",5:"how",6:"about",7:"you",8:"love",9:"football"}

Using words in the dictionary, the two texts can represent their histogram features using vector [1,1,1,1,1,1,1, 0] and vector [1,0,1,0,0,0,0, 1,1] respectively.

For the data collected in the embodiment, 12 driving behaviors obtained by clustering the driving behaviors are classified, and the clustering centers of the 12 classes are extracted to establish a dictionary containing all basic driving behavior words.

Step S12 driving behavior histogram feature extraction

When the method of applying the conventional bag-of-words model represents driving data, each driving behavior word is considered to be uniformly distributed, so that the driving pattern distribution tends to be a high-frequency behavior word. In order to balance the influence of the words of the driving behaviors with different frequencies, the invention assigns weight parameters with different sizes to the words with different frequencies, and the higher the occurrence frequency of the words is, the smaller the corresponding weight parameter is.

TF-IDF (Term Frequency-Inverse Document Frequency) is a widely applied statistical method for judging and evaluating the importance of words to documents. The invention adopts TF-IDF method to map the driving behavior histogram feature vector into words, then calculates the weight of the words, and carries out weighting treatment, the construction steps are as follows:

(1) assume that M driving behavior feature vectors, each of F ═ F, are extracted from the driving data d to be processed₁,f₂,f₃,K,f_MThe generated driving behavior dictionary W ═ W }₁,w₂,w₃,K,w_VV is the size of the driving behavior dictionary.

(2) The driving behavior feature vector f_iDriving behavior words w mapped into a driving behavior dictionary_ciI.e. finding its position c in the dictionary_i

c_i＝argmin||f_i-w_j||²And c is_i∈{1,2,L,V} (4-1)

Wherein the variance

Is the word w_ciF is the word frequency number centered on the word frequency.

(4) For the driving behavior word w_ciCalculate its weight

Where n is the total number of driving behavior words in the driving data.

After the weighted driving behavior word histogram is constructed by the TF-IDF method, each section of driving data can be represented by a group of words with weights.

And then, acquiring the histogram characteristics of the driving dictionary corresponding to each section of driving data. Randomly pick a driving data in the training driving data set, whose unweighted driving behavior word frequency histogram is shown in fig. 3.

After the driving behavior words in the same driving data are subjected to TF-IDF processing, a driving behavior word weighting frequency histogram is obtained, as shown in FIG. 4.

As can be seen from fig. 4, after TF-IDF weighting, the driving behavior word weighting frequency histogram becomes flat, the weight of the high frequency words is reduced, and the weight of the medium frequency words is increased, so that the distribution is equalized.

Step S2 training the improved T-LDA model by using driving behavior histogram feature

And aiming at the histogram features, modeling is carried out by using improved T-LDA, and model parameters are solved by Gibbs sampling, wherein the specific steps are as follows.

Step S21 time-stamp-based improved LDA topic model design

Since the original pLSA and LDA models are based on bag-of-words model assumptions, only the semantics of the words focusing on driving behavior ignore structural information when applying and driving pattern recognition. Therefore, the current research mainly focuses on compensating for the loss of structural information of the models when the two models are applied. The improvement of the topic model mainly focuses on two aspects: and improving the internal structure of the model and improving the hyper-parameters. The former is mainly to add observation variables or hidden variables in the model, and the latter is mainly to re-parameterize the hyper-parameters to carry out dynamic modeling.

From the current research, improved models become more and more complex, mainly embodied in more and more levels, and more hidden variables and hyper-parameters. Compared with other models which are continuously increased in the hierarchical structure, the invention only introduces Time information as an observation variable in the LDA model and proposes an improved model T-LDA (Time-LDA). When a driving behavior word is sampled from a driving mode theme, a time label corresponding to the driving behavior word is sampled, and a graph model of the improved model T-LDA is shown in fig. 5.

According to the graph model of the T-LDA, a piece of driving data is generated as follows:

(b) From above to below

Sampling a driving behavior word w for a multinomial distribution of parameters_ji；

(c) From above to below

Sampling a driving behavior word time t for a multi-term distribution of parameters_ji；

Compared with other models, the words in the T-LDA can be regarded as a vocabulary entry consisting of two words, namely the semantics of the driving behavior words and the time of the driving behavior words, so that the T-LDA can make up for the loss of the time information of the driving behavior words caused by the hypothesis of the bag-of-words model.

Step S22 solving model parameters using Gibbs sampling method

The method adopts a Gibbs sampling algorithm to approximately deduce parameters of a T-LDA model, and for driving data j, a driving behavior word w and a time label T thereof are given, except a driving behavior pattern z_jiAll driving modes other than z-_jiAnd the hyper-parameters alpha, beta and gamma, calculating the distribution of conditions

Wherein

representing the number of times the driving behavior word assigned to the driving pattern j in the driving data d does not contain the current driving pattern i. Finally, phi can be obtained,

And the formula for θ:

by theta, phi and

thereby obtaining the values of LDA model parameters alpha, beta and gamma.

The sampling process of the T-LDA model Gibbs sampling algorithm is as follows:

1) extracting a word from the document set randomly or in a certain order;

2) calculating the conditional probability p (z) of a selected word being assigned to a topic given all other words and topics_i＝j|z_-iW, t, α, β, γ), wherein z_-i＝{z₁,z₂,L z_i-1,z_i+1,L z_K}；

3) Randomly extracting a subject z_iReplacing the subject of the current word.

4) The above processes are continuously circulated until the alpha, beta and gamma finally converge to an invariant point.

The improved topic model is used for driving pattern recognition and is similar to an LDA model, and firstly, the driving behavior word histogram feature of each training driving data is obtained by using a bag-of-words model; and finally, solving model parameters alpha, beta and gamma of the training driving data by utilizing a Gibbs sampling algorithm to obtain the driving mode distribution in each driving data.

For new driving data d_testCalculation of p (z)_k|d_test). At this time, the probability distribution of the driving behavior words and their time labels in all driving patterns, i.e., the first two parts P (w) in equation (4-28), has been obtained through the training set_di|z_-i,z_i＝j,α,β)，P(t_di|z_-i,w_di,z_iJ, α, β, γ) are known, respectively

And

only the last part P (z) of solution (4-29) is required_i＝j,z_-iα, β, γ). The sampling formula obtained through derivation calculation is as follows:

the calculation process of equation (4-33) is performed by Gibbs sampling as in training. Test driving data d_testThe driving pattern type k included in (1) is determined by the following equation:

k＝argmax_kp(z_k|d_test) (4-34)

the collected data are processed according to the above principle, and the three topic models of the pLSA, the LDA and the T-LDA are evaluated from the two aspects of theory and practicality. The theoretical aspect is mainly that the similarity of actual data is evaluated through two parts, namely the confusion degree and the model; the practical aspect is embodied by the identification accuracy of the test model to the new driving data. Therefore, for each topic model parameter training process, 2 sections of driving data are used as a test set, other sections of driving data are used as training sets to learn topic model parameters, the process runs for 10 times in total, each section of driving data can be used as the test set for one time, and finally the average value of 10 running results is taken to obtain the driving pattern recognition accuracy rate.

(1) Selecting the number of best driving modes by confusion

When the theme model is applied to a driving mode, a reasonable theme number is firstly appointed to train the model, so that an index is needed to measure the modeling capacity of the theme model when the theme model has different theme numbers. Bleei et al, in their studies, proposed the use of Perplexity (Perplexity) to evaluate the quality of the subject model and achieve better results, and the present invention also selects Perplexity to determine the best number of subjects for the subject model.

For a set of M driving data D, N_dIs the driving behavior word w in the d-th driving data_dNumber of (c), p (w)_d) Probability of representing driving data, then perplexity (D) is

Generally, the smaller the confusion, the smaller the difference between the extracted theme representing the subject model and the actual theme, i.e. the better the modeling effect of the theme model.

The pLSA, LDA and improved LDA models were trained using the word histogram characteristics of driving behavior, and 2, 3, 4, 5, 6 driving patterns were extracted from 18 training driving data, respectively, and fig. 6 shows the average confusion comparison results after 10 training of the three models.

As can be seen from fig. 6, for the driving data set collected by the present invention, the confusion degree takes the minimum value when the driving mode is designated as 4 for all three subject models. Therefore, in the subsequent correlation analysis, the results obtained when the number of subjects is 4 are all specified. Among the three topic models, the LDA model modeling is better than the pLSA model, and the TLDA model modeling effect is better than the LDA model.

(2) Correlation analysis of reconstructed data and original data of main body model

The above three models will be described next using the distribution probability of the driving pattern in the driving data and the distribution probability of the driving behavior word in the driving pattern. And finally, reconstructing data by using the main body model, and performing correlation analysis on the data and the original data so as to measure the capability of different models for extracting the driving mode. The distributed probabilities of the driving patterns in the first 5 training driving data in the pLSA, LDA and T-LDA models are shown in fig. 7, 8 and 9.

As can be seen from fig. 7, 8 and 9, the pLSA, LDA and T-LDA models are substantially uniform in distribution for the 4 driving patterns in the first 5 driving data.

The distributed probabilities of driving behavior words in 4 driving patterns in the pLSA, LDA and T-LDA models are shown in fig. 10, 11, 12 and 13.

It can be defined as a cautious driving pattern according to the driving behavior word probability distribution of the driving pattern 1 in fig. 10.

According to the driving behavior word probability distribution of the driving pattern 2 in fig. 11, it can be defined as a general type driving pattern.

It can be defined as an aggressive driving pattern according to the driving behavior word probability distribution of the driving pattern 3 in fig. 12.

From the distribution of the driving behavior words in the driving pattern 4 in fig. 13, it can be defined as a very aggressive driving pattern. The distributed probabilities of driving behavior words in 4 driving patterns in the pLSA and LDA and T-LDA models are also substantially consistent, as shown in fig. 10, 11, 12 and 13. The driving data is reconstructed by using the training results of the three models, that is, the distribution of the driving patterns in the driving data and the distribution of the driving behavior words in the driving patterns, and the correlation analysis is performed on the reconstructed driving data and the acquired original driving data, so that the correlation coefficient between the reconstructed data and the original driving data of each driving data can be obtained, as shown in fig. 14.

The correlation coefficient can represent the consistency of the reconstructed data and the original data, and the effect of the T-LDA model is superior to that of the LDA and the pLSA models, which shows that the T-LDA model can better describe the driving mode implicit in the driving data compared with the LDA and the pLSA models.

(3) Rate of driving pattern recognition

The accuracy of recognition in the test driving data for the pLSA, LDA and T-LDA models for the 4 driving modes is shown in table 1 below. The recognition rate for each driving pattern is the average of 10 cross-validation training results.

TABLE 1 pLSA, LDA and T-LDA model identification rates

As can be seen from Table 1, the recognition rate of the T-LDA model provided by the invention on 4 driving modes is superior to that of the pLSA and the LDA models, which shows that the driving modes extracted by the T-LDA model compared with the LDA and the pLSA models are closer to the actual driving modes implied in the driving data.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A driving behavior pattern recognition method based on a T-LDA topic model is characterized by comprising the following steps:

2. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 1, wherein the specific method of step S1 is:

3. The driving behavior pattern recognition method based on the T-LDA topic model of claim 2, wherein the TF-IDF method in step S12 comprises the following steps:

(1) assume that M driving behavior feature vectors, each of F ═ F, are extracted from the driving behavior data d to be processed₁,f₂,f₃,… ,f_MThe generated driving behavior dictionary W ═ W }₁,w₂,w₃,… ,w_VV is the size of the driving behavior dictionary;

c_i＝argmin||f_i-w_j||²And c is_i∈{1,2,… ,V}

Wherein the variance

(4) for the driving behavior word w_ciCalculate its weight

Where n is the total number of driving behavior words in the driving data.

4. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 1, wherein the specific method of step S2 is:

and S22, solving the model parameters by using a Gibbs sampling method.

5. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 4, wherein the driving behavior data of the T-LDA model generated in step S21 is generated by:

(2) For theEach driving mode is sampled from Dirichlet distribution with obedience parameter as gamma to obtain K driving behavior word time labels-multiple distribution parameters phi of the driving mode_z；

(b) From above to below

(c) From above to below

6. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 5, wherein the method for solving the model parameters by using the Gibbs sampling method in step S22 is as follows:

(1) extracting a word from the document set randomly or in a certain order;

(2) calculating the conditional probability p (z) of a selected word being assigned to a topic given all other words and topics_i＝j|z_-iW, t, α, β, γ), wherein z_-i＝{z₁,z₂,…, z_i-1,z_i+1,…, z_K}；

(3) Randomly extracting a subject z_iReplacing the subject of the current word;

7. The driving behavior pattern recognition method based on the T-LDA topic model as claimed in claim 6, wherein the method for solving the values of the parameters α, β and γ in step S22 is as follows:

for driving data j, given a driving behavior word w and its time tag t, divided by a driving behavior pattern z_jiAll driving modes other than z-_jiAnd the hyper-parameters alpha, beta and gamma, calculating the distribution of conditions

Wherein

And the formula for θ:

by theta, phi and