CN109086794A - A kind of driving behavior mode knowledge method based on T-LDA topic model - Google Patents

A kind of driving behavior mode knowledge method based on T-LDA topic model Download PDF

Info

Publication number
CN109086794A
CN109086794A CN201810676019.6A CN201810676019A CN109086794A CN 109086794 A CN109086794 A CN 109086794A CN 201810676019 A CN201810676019 A CN 201810676019A CN 109086794 A CN109086794 A CN 109086794A
Authority
CN
China
Prior art keywords
driving behavior
driving
word
mode
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810676019.6A
Other languages
Chinese (zh)
Other versions
CN109086794B (en
Inventor
石英
罗佳齐
李振威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201810676019.6A priority Critical patent/CN109086794B/en
Publication of CN109086794A publication Critical patent/CN109086794A/en
Application granted granted Critical
Publication of CN109086794B publication Critical patent/CN109086794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention discloses a kind of, and the driving behavior mode based on T-LDA topic model knows method, it is extracted method includes the following steps: S1, driving behavior dictionary are established with driving behavior histogram feature, driving behavior dictionary is established according to the cluster result of driving behavior data;Construct driving data-driving behavior word co-occurrence matrix, i.e. driving behavior histogram feature;S2, improved T-LDA model is trained using driving behavior histogram feature, and then constructs driving data, driving mode, the relationship between driving behavior word three, and introduce label of the temporal information as driving behavior word;Model is trained using the driving behavior histogram feature with time tag, and uses Gibbs sampling method solving model parameter, exports driving behavior recognition result.The present invention can be effective for driving behavior pattern-recognition.

Description

A kind of driving behavior mode knowledge method based on T-LDA topic model
Technical field
The present invention relates to driving behavior mode identification technology more particularly to a kind of driving based on T-LDA topic model It sails behavior pattern and knows method.
Background technique
Driver's abnormal activity bring traffic safety problem becomes increasingly conspicuous in driving procedure now.Traffic accident Producing cause instance analysis shows that driver suddenly accelerates, suddenly the bad steerings behavior such as deceleration, zig zag is to generate traffic accident Principal element.In order to improve drive safety, obtain driving data in time, therefrom extract driving behavior, carry out identification and Behavior improvement has become research hotspot.The quick of intelligent mobile terminal is popularized, so that vehicle operation data acquisition is more convenient, Convenient for analyzing driving behavior and the driving mode of driver.
There is scholar to obtain acceleration degree of the vehicle on horizontally and vertically using the acceleration transducer built in smart phone According to identifying, achieve good results to driving behaviors such as acceleration and deceleration and turnings.Some scholars are passed using acceleration Sensor acquires vehicle acceleration information, and is divided into basic, normal, high three kinds of ranks, establishes acceleration level information and drives mould Driving mode is finally divided into four classes by the relationship of formula classification: common horizontal cautious style driving mode below, driving behavior without The common driving mode of menace, the radical driving mode with certain menace and very radical with great threat Driving mode.
The research of mainstream at present is all directly to carry out on the low-level image feature of driving data, that is, by acceleration or turn The duration of curved equal behaviors, severe degree are judged, then are identified to driving mode.The further investigation of driving mode Show only to identify driving mode in single driving behavior in driving data, without considering different driving behaviors in driving data The specific combination of sequence will lead to bad for the applicability of different road conditions and different periods.The research of researcher Emphasis, which is turned to, judges driving mode based on the understanding of driving behavior sequence.
There is scholar using statistical model to driving behaviors such as a series of acceleration, the decelerations occurred in driver's driving procedure Combination carries out driving mode research, has therefrom excavated the different othernesses for driving human world driving mode.Therefore researcher's handle Sight turns to uses mature statistical model algorithm-topic model algorithm in text analyzing field already.Topic model passes through It extracts the subject information hidden in a document and Classification Management is carried out to document, the amplification of implicit variable is the theme, as in text One group of related term it is abstract, by the study to training sample, the model parameterization table for generating different texts can be constructed.It borrows The topic model that reflects identifies the thinking in application in text analyzing and image scene, and driving data can be regarded as to document, i.e. driving number It is constituted according to by different driving modes (theme), each driving mode (theme) is represented the single of this mode and driven by a series of Sail behavior (word) composition.
PLSA is one of most representative topic model, which passes through analysis " word-document " co-occurrence matrix, meter Each word statistical probability distribution in document is calculated, and then determines document subject matter.But its training parameter can be with driving data set Increase and it is linearly increasing, cause calculate it is more complicated;And model can only be generated to training driving data collection, new is driven It is bad to sail data recognition effect.For disadvantages mentioned above, there is scholar to propose implicit Di Li Cray distribution on the basis of pLSA (Latent Dirichlet Allocation, LDA) model only need to indicate data using suitable parameter, can avoid Fitting problems.
The present invention extracts the cluster centre of variant driving behavior class, it is regarded the word in driving behavior dictionary, The number that different driving behavior words occur in driving data is counted, driving behavior word weighted histogram feature is obtained.For The deficiency of current main-stream topic model pLSA, LDA, the present invention proposes on the basis of LDA model introduces changing for time tag Into LDA model, i.e. T-LDA model identifies driving mode.The experimental results showed that improved model can effectively excavate driving A series of characteristic of continuous driving behaviors in data improves the accuracy rate of driving mode identification.
Summary of the invention
The technical problem to be solved in the present invention is that for the defects in the prior art, providing a kind of based on T-LDA theme The driving behavior mode of model knows method.
The technical solution adopted by the present invention to solve the technical problems is:
The present invention provides a kind of driving behavior mode knowledge method based on T-LDA topic model, and this method includes following step It is rapid:
S1, driving behavior dictionary are established to be extracted with driving behavior histogram feature: input driving behavior data, and to it Clustering processing is carried out, driving behavior dictionary is established according to the cluster result of driving behavior data;Extract variant driving behavior The cluster centre of class counts different driving behavior words in driving data and occurs by it as the word in driving behavior dictionary Number, obtain driving data-driving behavior word co-occurrence matrix, i.e. driving behavior histogram feature;
S2, be trained using driving behavior histogram feature to improved T-LDA model: T-LDA model includes two A part, one are driving mode type and its probability density distribution contained by every section of driving data, secondly being each driving mode Contained driving behavior word type and its probability density distribution, and then construct driving data, driving mode, driving behavior word Relationship between three, and label of the temporal information as driving behavior word is introduced, make neighbouring multiple driving behavior groups Altogether;Model is trained using the driving behavior histogram feature with time tag, and uses gibbs sampler side Method solving model parameter exports driving behavior recognition result.
Further, step S1 of the invention method particularly includes:
The foundation of S11, driving behavior dictionary: feature extraction is carried out to original driving behavior data, and feature is selected to carry out Cluster;For the different driving behaviors classification that cluster obtains, take all kinds of cluster centres as the word in bag of words, institute There is the set of different driving behaviors just to constitute driving behavior dictionary;Different size of weight ginseng is assigned to the word of different frequency Number, the word frequency of occurrences is higher, and corresponding weight value parameter is smaller;
S12, driving behavior histogram feature extract: according to the local feature of driving behavior data to be processed, using Driving behavior histogram feature DUAL PROBLEMS OF VECTOR MAPPING is word by TF-IDF method, is searched for therewith in constructed driving behavior dictionary Corresponding behavior word calculates word frequency of occurrences histogram, characterizes driving behavior sequence with this.
Further, the detailed process of the TF-IDF method in step S12 of the invention are as follows:
(1) assume to extract M driving behavior vector from driving behavior data d to be processed, be F=respectively {f1,f2,f3,...,fM, generated driving behavior dictionary W={ w1,w2,w3,...,wV, V is the big of driving behavior dictionary It is small;
(2) by driving behavior vector fiThe driving behavior word w being mapped in driving behavior dictionaryci, that is, find out it Position c in dictionaryi:
ci=argmin | | fi-wj||2And ci∈{1,2,…,V}
(3) to each driving behavior vector fiThe driving behavior word w of mappingci, it is calculated using Gaussian function Weight
Wherein, variance It is word wciWord frequency number, f is word frequency word frequency number placed in the middle;
(4) for driving behavior word wciCalculate its weight
Wherein, n is the sum of driving behavior word in driving data.
Further, step S2 of the invention method particularly includes:
S21, based on time tag improvement LDA topic model design: to LDA model introduce driving behavior word when Between information as observational variable, seek parameter as the label of driving behavior word, and to improved T-LDA model, Finally driving mode identification is carried out using T-LDA model;
S22, Gibbs sampling method solving model parameter is used.
Further, the method that the driving behavior data of T-LDA model generate in step S21 of the invention are as follows:
(1) for each driving mode, K driving is obtained to obey parameter as sampling in the Di Li Cray distribution of β The behavior semanteme of word-driving mode Parameters of Multinomial Distribution
(2) it for each driving mode, is driven from obtaining K for sampling in the Di Li Cray distribution of γ to obey parameter Sail behavior word time label-driving mode Parameters of Multinomial Distribution φz
(3) it for each section of driving data, obtains driving mould to obey parameter as sampling in the Di Li Cray distribution of α Formula-driving data Parameters of Multinomial Distribution θj
(4) as follows for the generating process of each of driving data j driving behavior word:
(a) from θjA driving mode z is obtained for sampling in the multinomial distribution of parameterji
(b) from φzjiTo sample a driving behavior word w in the multinomial distribution of parameterji
(c) fromTo sample a driving behavior word time t in the multinomial distribution of parameterji
Further, the method for Gibbs sampling method solving model parameter is used in step S22 of the invention are as follows:
(1) random or a word is extracted sequentially from collection of document with some;
(2) under conditions of other all words and given theme, the condition that selected word distributes to a theme is calculated Probability p (zji|z-ji, w, t, α, beta, gamma), wherein z-i={ z1,z2,…zi-1,zi+1,…zK};
(3) a theme z is randomly selectediTo replace the theme of current word.
(4) above procedure is constantly recycled, until α, beta, gamma finally converges on an invariant point.
Further, parameter alpha, the method for the value of β and γ are solved in step S22 of the invention are as follows:
For driving data j, driving behavior word w and its time tag t is given, removes driving behavior mode zjiIn addition All driving mode z-jiAnd hyper parameter α, β and γ, design conditions distribution p (zji|z-ji, w, t, α, β, γ):
p(zi=j | z-i,w,t,α,β,γ)∝P(wdi|z-i,tdi,zi=j, α, β, γ)
×P(tdi|z-i,wdi,zi=j, α, β, γ) × P (zi=j, z-i,α,β,γ)
WhereinIndicate that driving behavior word w is assigned to driving mode j without time comprising current driving mode i Number,Indicate that driving behavior word w time tag t is assigned to driving mode j without time comprising current driving mode i Number,It indicates to distribute to the driving behavior word of driving mode j without comprising current driving mode i's in driving data d Number;Obtain φ,With the calculation formula of θ:
By θ, φ andTo obtain the value of T-LDA model parameter α, β and γ.
The beneficial effect comprise that: the driving behavior mode knowledge side of the invention based on T-LDA topic model Method proposes the temporal information that driving behavior word is introduced to LDA model as observational variable, as driving behavior word Label, and parameter is sought to improved model, finally it is utilized to carry out driving mode identification;Traditional algorithm is overcome to ignore Successive this structural information between driving behavior word leads to the driving for certain driving behavior words continuous in time The problem of pattern-recognition inaccuracy;And gibbs sampler training pattern is used, for its model parameter asking compared with the habit that finds it difficult to learn Topic.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the basic flow chart of driving behavior mode identification method;
Fig. 2 is experiment section;
Fig. 3 is unweighted driving behavior uni-gram frequency statistic histogram;
Fig. 4 is weighting driving behavior uni-gram frequency statistic histogram;
Fig. 5 is the graph model of improved model T-LDA;
Fig. 6 is that three model puzzlement degree compare figure;
Fig. 7 is based on pLSA model driving mode probability distribution training result;
Fig. 8 is based on LDA model driving mode probability distribution training result;
Fig. 9 is based on T-LDA model driving mode probability distribution training result;
Figure 10 is the driving behavior word probability distribution map of driving mode 1;
Figure 11 is the driving behavior word probability distribution map of driving mode 2;
Figure 12 is the driving behavior word probability distribution map of driving mode 3;
Figure 13 is that the driving behavior word probability of driving mode 4 is distributed;
Tri- model reconstruction data of Figure 14 and initial data related coefficient.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, The present invention will be described in further detail.It should be appreciated that described herein, specific examples are only used to explain the present invention, It is not intended to limit the present invention.
Basic procedure of the invention is as shown in Figure 1, being primarily based on bag of words building driving behavior dictionary and extracting driving Then there is structural information missing traditional theme model in behavior histogram feature, propose to introduce structural information T-LDA model, and by gibbs sampler solving model parameter, it goes using the T-LDA model learnt out for driving to be measured To identify its mode.Specific step is as follows for it.
Step S1 driving behavior dictionary is established to be extracted with driving behavior histogram feature
For verification algorithm performance, first collection data, paper face main for the research of driving behavior and driving mode To general driver, so acquiring the driving data of common driver in experiment, 10 drivers have been selected altogether Member.
In order to reduce influence of the road surface to driving data, have following two o'clock requirement to experiment section: pavement of road is more flat Whole and height change is small.One section of road of paper chose Wuhan City Hongshan District as test section carries out system testing, should Section elevation changes small, road surface evenness, and the big road segment segment of male Chu more than existing straight way, the software centre section for also having turning more is conducive to The development of experiment, experiment section is as shown in Figure 2 (blue graticule).
The present invention acquires 20 sections of driving datas on experiment section to 10 drivers altogether, and therefrom utilizes endpoint Detection algorithm has extracted 450 driving behavior segments.In driving behavior identification, using 330 driving behavior segment conducts The training set of clustering algorithm, 120 validity as test set verification algorithm.In driving mode identification, it is contemplated that drive Data sample lazy weight carries out the verifying of topic model algorithm, i.e. topic model parameter each time using 10 folding cross validations Training process uses 2 sections of driving datas as test set, and others learn topic model parameter as training set, this process one It runs 10 times altogether, every section of driving data all can be primary as test set, finally takes the average value of 10 operation results, is driven Sail pattern-recognition accuracy rate.
The foundation of step S11 driving behavior dictionary
For the above collected data, the present invention selects bag of words to establish driving behavior dictionary, obtains driving behavior Histogram feature, i.e., the mono- driving behavior word co-occurrence matrix of driving data-in topic model.
Bag of words method only relates to this unique parameters of word number in dictionary, has the advantages that intuitive effective.First will All words occurred in text are placed in a dictionary, and the frequency information for then word each in dictionary occur is using a kind of Simple and effective document representation method, i.e. histogram indicate, finally measure to the similarity degree of different texts.
By taking text as an example, the foundation of dictionary is by carrying out frequency statistics completion to the word of all appearance in the text , such as to two different text " I like playing basketball, how about you? " " I love Playing football. " constructs dictionary.
Dictionary=1: " I ", 2: " like ", 3: " playing ", 4: " basketball ", 5: " how ", 6: " about",7:"yo u",8:"love",9:"football"}
Using the word in dictionary, the two texts can use vector [1,1,1,1,1,1,1,1,0] and vector respectively [1,0,1,0,0,0,0,0,1,1] its histogram feature is indicated.
Data collected for the present embodiment classify to 12 driving behaviors that driving behavior clusters, extract this The cluster centre of 12 classes establishes the dictionary comprising all nitril pilot behavior words.
Step S12 driving behavior histogram feature extracts
When indicating driving data using the method for traditional bag of words, it is believed that each driving behavior word is uniformly distributed, with Driving mode distribution is caused to tend to high frequency behavior word.For the balanced influence for considering each frequency driving behavior word, the present invention Different size of weighting parameter is assigned to the word of different frequency, the word frequency of occurrences is higher, and corresponding weight value parameter is smaller.
TF-IDF (Term Frequency-Inverse Document Frequency), which is that one kind is widely used, to be sentenced Statistical method of the disconnected and assessment word for importance of documents.The present invention is special by driving behavior histogram using TF-IDF method Sign DUAL PROBLEMS OF VECTOR MAPPING is word, then calculates its weight, is weighted processing, construction step is as follows:
(1) assume to extract M driving behavior vector from driving data d to be processed, be F={ f respectively1, f2,f3,...,fM, generated driving behavior dictionary W={ w1,w2,w3,...,wV, V is the size of driving behavior dictionary.
(2) by driving behavior vector fiThe driving behavior word w being mapped in driving behavior dictionaryci, that is, find out it Position c in dictionaryi
ci=argmin | | fi-wj||2And ci∈{1,2,…,V} (4-1)
(3) to each driving behavior vector fiThe driving behavior word w of mappingci, it is calculated using Gaussian function Weight
Wherein, variance It is word wciWord frequency number, f is word frequency word frequency number placed in the middle.
(4) for driving behavior word wciCalculate its weight
Wherein, n is the sum of driving behavior word in driving data.
After above-mentioned TF-IDF method building weighting driving behavior word histogram, every section of driving data can be with It is characterized by one group of word of Weight.
Next it corresponding to every section of driving data can drive dictionary and obtain its histogram feature.In training driving data It concentrates and selects a driving data at random, unweighted driving behavior uni-gram frequency histogram, as shown in Figure 3.
After the driving behavior word sampling TF-IDF processing in the same driving data, obtained driving behavior word Frequency histogram is weighted, as shown in Figure 4.
As shown in Figure 4, after TF-IDF is weighted, driving behavior word weighting frequency histogram becomes flat, reduces The weight of intermediate frequency word has been turned up in the weight of high frequency words, so that its distributing equilibrium.
Step S2 is trained improved T-LDA model using driving behavior histogram feature
It for the above histogram feature, is modeled using T-LDA after improvement, and is joined with gibbs sampler solving model Number, specific step is as follows.
Step S21 is designed based on the improvement LDA topic model of time tag
Due to original pLSA and LDA model be based on bag of words it is assumed that application with driving mode identify when, only The semanteme of concern driving behavior word has ignored structural information.Therefore current research is main to collect in the two models of application In in the loss for the structural information for making up model.Two aspects are focused primarily upon for the improvement of topic model: to model inside It the improvement of structure and is improved on hyper parameter.The former mainly increases observational variable or implicit variable in a model, and The latter is mainly by hyper parameter reparameterization, dynamically to be modeled.
From the current study, improved model becomes to become increasingly complex, and it is more and more to be mainly reflected in level, hidden It is also more and more containing variable and hyper parameter.It is constantly increasing in hierarchical structure compared to other models, the present invention is only in LDA Temporal information is introduced in model as observational variable, proposes improved model T-LDA (Time-LDA).When from a driving mould Time tag corresponding to the driving behavior word can be sampled out while sampling a driving behavior word in formula theme, is changed The graph model of progressive die type T-LDA is as shown in Figure 5.
According to the graph model of T-LDA, one section of driving data generates in the following manner:
(1) for each driving mode, K driving is obtained to obey parameter as sampling in the Di Li Cray distribution of β The behavior semanteme of word-driving mode Parameters of Multinomial Distribution
(2) it for each driving mode, is driven from obtaining K for sampling in the Di Li Cray distribution of γ to obey parameter Sail behavior word time label-driving mode Parameters of Multinomial Distribution φz
(3) it for each section of driving data, obtains driving mould to obey parameter as sampling in the Di Li Cray distribution of α Formula-driving data Parameters of Multinomial Distribution θj
(4) as follows for the generating process of each of driving data j driving behavior word:
(a) from θjA driving mode z is obtained for sampling in the multinomial distribution of parameterji
(b) from φzjiTo sample a driving behavior word w in the multinomial distribution of parameterji
(c) fromTo sample a driving behavior word time t in the multinomial distribution of parameterji
Compared with other models, the word in T-LDA can be considered as the entry being made of two words, that is, drive Semanteme and the time of driving behavior word of behavior word are sailed, therefore T-LDA can make up for it caused by assuming because of bag of words The loss of driving behavior word time information.
Step S22 uses Gibbs sampling method solving model parameter
We do approximate derivation to the parameter of T-LDA model using gibbs sampler algorithm, for driving data j, give Determine driving behavior word w and its time tag t, removes driving behavior mode zjiAll driving mode z- in additionji, and super ginseng Count α, β and γ, design conditions distribution p (zji|z-ji,w,t,α,β,γ)
WhereinIndicate that driving behavior word w is assigned to driving mode j without time comprising current driving mode i Number,Indicate that driving behavior word w time tag t is assigned to driving mode j without time comprising current driving mode i Number,It indicates to distribute to the driving behavior word of driving mode j without comprising current driving mode i's in driving data d Number.Finally, available φ,With the calculation formula of θ:
By θ, φ andTo obtain the value of LDA model parameter α, β and γ.
The sampling process of T-LDA model gibbs sampler algorithm is as follows:
1) random or a word is extracted sequentially from collection of document with some;
2) under conditions of other all words and given theme, the condition that selected word distributes to a theme is calculated Probability p (zji|z-ji, w, t, α, beta, gamma), wherein z-i={ z1,z2,…zi-1,zi+1,…zK};
3) a theme z is randomly selectediTo replace the theme of current word.
4) above procedure is constantly recycled, until α, beta, gamma finally converges on an invariant point.
Improved topic model is similar with LDA model for driving mode identification, obtains first with bag of words each The driving behavior word histogram feature of training driving data;Finally training driving data is asked using gibbs sampler algorithm Solve model parameter α, β and γ, so that it may obtain the driving mode distribution in each driving data.
For new driving data dtest, need to calculate p (zk|dtest).At this point, having obtained all drive by training set Sail the probability distribution of driving behavior word and its time tag in mode, i.e., in formula (4-28) before two parts P (wdi|z-i, zi=j, α, β), P (tdi|z-i,wdi,zi=j, α, beta, gamma) it is known that being respectivelyWithIn demand solution formula (4-29) Last part P (zi=j, z-i,α,β,γ).The sampling formula being calculated by derivation are as follows:
Formula (4-33) calculating process as when training is carried out by gibbs sampler.Test drives data dtest Included in driving mode type k determined by following formula:
K=argmaxkp(zk|dtest) (4-34)
Collected data are handled according to principles above, and from it is theoretical and it is practical come in terms of the two to pLSA, These three topic models of LDA, T-LDA are assessed.Theoretical side is mainly the phase by puzzled degree and model to real data Like degree two parts assessment;Practical aspect is then that test model embodies the recognition accuracy of new driving data.Therefore for Topic model parameter training process each time, uses 2 sections of driving datas as test set, and others learn to lead as training set Model parameter is inscribed, this process is run 10 times altogether, and every section of driving data all can be primary as test set, finally takes 10 operations As a result average value obtains driving mode recognition accuracy.
(1) optimum drive number of modes is chosen by puzzlement degree
When application topic model carries out driving mode, a specified reasonable number of topics is first had to carry out model Training, it is therefore desirable to which an index measures the quality of topic model modeling ability in different themes number.Blei etc. is ground at it Study carefully middle proposition and assess the quality of topic model using puzzlement degree (Perplexity), and achieves preferable effect, the present invention Also puzzlement degree is chosen to determine the best number of topics of topic model.
For having the set D, N of M driving datadIt is driving behavior word w in d-th of driving datadNumber, p (wd) represent the probability of driving data, then puzzle degree Perplexity (D) be
Usual puzzlement degree is smaller, and the gap of the theme extracted and actual subject that indicate agent model is smaller, i.e. theme The modeling effect of model is better.
Using driving behavior word noxkata feature, pLSA, LDA and improvement LDA model are trained, from 18 training 2,3,4,5,6 kind of driving mode are extracted in driving data respectively, Fig. 6 show flat after three models are carried out with 10 training Puzzled degree comparison result.
It will be appreciated from fig. 6 that for the driving data collection that the present invention acquires, three kinds of topic models are appointed as 4 in driving mode When, puzzlement degree is minimized.Therefore in correlation analysis later, be designated key number be 4 when result.At these three Between topic model, LDA model modeling wants that, better than pLSA model, TLDA model modeling effect is better than LDA model.
(2) agent model reconstruct data and initial data correlation analysis
It next will be using point of driving behavior word in the distribution probability and driving mode of driving mode in driving data Cloth probability describes three above model.Data finally are reconstructed using agent model, and correlation point is done with former number data to it Analysis, and then measure the ability of different model extraction driving modes.First 5 trained driving datas in pLSA, LDA and T-LDA model The distribution probability of middle driving mode, as shown in Fig. 7,8 and 9.
By Fig. 7,8 and 9 it is found that pLSA, LDA and T-LDA model for 4 kinds of driving modes in preceding 5 driving datas point Cloth is substantially uniform.
In pLSA, LDA and T-LDA model in 4 kinds of driving modes driving behavior word distribution probability such as Figure 10,11,12 Shown in 13.
According to the driving behavior word probability distribution of driving mode 1 in Figure 10, cautious style can be defined as and drive mould Formula.
According to the driving behavior word probability distribution of driving mode 2 in Figure 11, plain edition can be defined as and drive mould Formula.
According to the driving behavior word probability distribution of driving mode 3 in Figure 12, radical type can be defined as and drive mould Formula.
According in Figure 13 in driving mode 4 driving behavior word distribution, very radical type can be defined as and drive mould Formula.As shown in Figure 10,11,12 and 13, in pLSA and LDA and T-LDA model in 4 kinds of driving modes driving behavior word point Cloth probability is also substantially uniform.Using the training result of three models, i.e., the distribution and driving of driving mode in driving data Driving behavior word is distributed in mode, reconstructs driving data, and carry out correlation analysis with original driving data collected, It can get the reconstruct data of each driving data and the related coefficient of former data, as shown in figure 14.
Related coefficient can characterize the consistency of reconstruct data and former data, the effect of T-LDA model be better than LDA and PLSA model, this illustrates that T-LDA model can preferably describe the driving implied in driving data compared to LDA and pLSA model Mode.
(3) driving mode discrimination
For 4 kinds of driving modes, the accuracy rate that pLSA, LDA and T-LDA model identify in test drives data is as follows Shown in table 1.The discrimination of each driving mode is the average value of 10 cross validation training results.
Table 1pLSA, LDA and T-LDA model discrimination
As shown in Table 1, discrimination of the T-LDA model proposed by the present invention on 4 kinds of driving modes be superior to pLSA and LDA model, this illustrate T-LDA model compared to LDA and pLSA model extraction go out driving mode and driving data in imply True driving mode more close to.
It should be understood that for those of ordinary skills, can be improved or be become according to the above description It changes, and all these modifications and variations should all belong to the protection domain of appended claims of the present invention.

Claims (7)

1. a kind of driving behavior mode based on T-LDA topic model knows method, which is characterized in that this method includes following step It is rapid:
S1, driving behavior dictionary are established to be extracted with driving behavior histogram feature: input driving behavior data, and is gathered to it Class processing, establishes driving behavior dictionary according to the cluster result of driving behavior data;Extract the poly- of variant driving behavior class Class center counts the number that different driving behavior words occur in driving data by it as the word in driving behavior dictionary, Obtain driving data-driving behavior word co-occurrence matrix, i.e. driving behavior histogram feature;
S2, be trained using driving behavior histogram feature to improved T-LDA model: T-LDA model includes two portions Point, one is driving mode type and its probability density distribution contained by every section of driving data, secondly for contained by each driving mode Driving behavior word type and its probability density distribution, so construct driving data, driving mode, driving behavior word three it Between relationship, and introduce label of the temporal information as driving behavior word, neighbouring multiple driving behaviors made to combine; Model is trained using the driving behavior histogram feature with time tag, and solves mould using Gibbs sampling method Shape parameter exports driving behavior recognition result.
2. the driving behavior mode according to claim 1 based on T-LDA topic model knows method, which is characterized in that step Rapid S1's method particularly includes:
The foundation of S11, driving behavior dictionary: feature extraction is carried out to original driving behavior data, and feature is selected to be clustered; For the different driving behaviors classification that cluster obtains, take all kinds of cluster centres as the word in bag of words, all differences The set of driving behavior just constitutes driving behavior dictionary;Different size of weighting parameter, word are assigned to the word of different frequency The frequency of occurrences is higher, and corresponding weight value parameter is smaller;
S12, driving behavior histogram feature extract: according to the local feature of driving behavior data to be processed, using TF-IDF Driving behavior histogram feature DUAL PROBLEMS OF VECTOR MAPPING is word by method, searches for corresponding row in constructed driving behavior dictionary For word, word frequency of occurrences histogram is calculated, driving behavior sequence is characterized with this.
3. the driving behavior mode according to claim 2 based on T-LDA topic model knows method, which is characterized in that step The detailed process of TF-IDF method in rapid S12 are as follows:
(1) assume to extract M driving behavior vector from driving behavior data d to be processed, be F={ f respectively1,f2, f3,...,fM, generated driving behavior dictionary W={ w1,w2,w3,...,wV, V is the size of driving behavior dictionary;
(2) by driving behavior vector fiThe driving behavior word w being mapped in driving behavior dictionaryci, that is, it is found out in word Position c in allusion quotationi:
ci=argmin | | fi-wj||2And ci∈{1,2,…,V}
(3) to each driving behavior vector fiThe driving behavior word w of mappingci, its weight is calculated using Gaussian function
Wherein, variance It is word wciWord frequency number, f is word frequency word frequency number placed in the middle;
(4) for driving behavior word wciCalculate its weight
Wherein, n is the sum of driving behavior word in driving data.
4. the driving behavior mode according to claim 1 based on T-LDA topic model knows method, which is characterized in that step Rapid S2's method particularly includes:
S21, the improvement LDA topic model design based on time tag: the temporal information of driving behavior word is introduced to LDA model As observational variable, parameter is sought as the label of driving behavior word, and to improved T-LDA model, last benefit Driving mode identification is carried out with T-LDA model;
S22, Gibbs sampling method solving model parameter is used.
5. the driving behavior mode according to claim 4 based on T-LDA topic model knows method, which is characterized in that step The method that the driving behavior data of T-LDA model generate in rapid S21 are as follows:
(1) for each driving mode, K driving behavior is obtained to obey parameter as sampling in the Di Li Cray distribution of β The semanteme of word-driving mode Parameters of Multinomial Distribution
(2) for each driving mode, K driving behavior is obtained to obey parameter as sampling in the Di Li Cray distribution of γ Word time label-driving mode Parameters of Multinomial Distribution φz
(3) it for each section of driving data, is driven from driving mode-is obtained for sampling in the Di Li Cray distribution of α to obey parameter Sail the Parameters of Multinomial Distribution θ of dataj
(4) as follows for the generating process of each of driving data j driving behavior word:
(a) from θjA driving mode z is obtained for sampling in the multinomial distribution of parameterji
(b) fromTo sample a driving behavior word w in the multinomial distribution of parameterji
(c) fromTo sample a driving behavior word time t in the multinomial distribution of parameterji
6. the driving behavior mode according to claim 5 based on T-LDA topic model knows method, which is characterized in that step The method of Gibbs sampling method solving model parameter is used in rapid S22 are as follows:
(1) random or a word is extracted sequentially from collection of document with some;
(2) under conditions of other all words and given theme, the conditional probability p that selected word distributes to a theme is calculated (zji|z-ji, w, t, α, beta, gamma), wherein z-i={ z1,z2,…zi-1,zi+1,…zK};
(3) a theme z is randomly selectediTo replace the theme of current word.
(4) above procedure is constantly recycled, until α, beta, gamma finally converges on an invariant point.
7. the driving behavior mode according to claim 6 based on T-LDA topic model knows method, which is characterized in that step Parameter alpha, the method for the value of β and γ are solved in rapid S22 are as follows:
For driving data j, driving behavior word w and its time tag t is given, removes driving behavior mode zjiIn addition all drive Sail mode z-jiAnd hyper parameter α, β and γ, design conditions distribution p (zji|z-ji, w, t, α, β, γ):
WhereinIndicate that driving behavior word w is assigned to driving mode j without the number comprising current driving mode i,Indicate that driving behavior word w time tag t is assigned to driving mode j without the number comprising current driving mode i,It indicates to distribute to the driving behavior word of driving mode j without the number comprising current driving mode i in driving data d; Obtain φ,With the calculation formula of θ:
By θ, φ andTo obtain the value of T-LDA model parameter α, β and γ.
CN201810676019.6A 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model Active CN109086794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810676019.6A CN109086794B (en) 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810676019.6A CN109086794B (en) 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model

Publications (2)

Publication Number Publication Date
CN109086794A true CN109086794A (en) 2018-12-25
CN109086794B CN109086794B (en) 2022-03-01

Family

ID=64839853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810676019.6A Active CN109086794B (en) 2018-06-27 2018-06-27 Driving behavior pattern recognition method based on T-LDA topic model

Country Status (1)

Country Link
CN (1) CN109086794B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378397A (en) * 2019-06-27 2019-10-25 深圳大学 A kind of driving style recognition methods and device
CN111126438A (en) * 2019-11-22 2020-05-08 北京理工大学 Driving behavior recognition method and system
CN113159105A (en) * 2021-02-26 2021-07-23 北京科技大学 Unsupervised driving behavior pattern recognition method and data acquisition monitoring system
CN113239964A (en) * 2021-04-13 2021-08-10 联合汽车电子有限公司 Vehicle data processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100209885A1 (en) * 2009-02-18 2010-08-19 Gm Global Technology Operations, Inc. Vehicle stability enhancement control adaptation to driving skill based on lane change maneuver
DE102014205127A1 (en) * 2013-04-17 2014-10-23 Ford Global Technologies, Llc Control of the driving dynamics of a vehicle with ruts compensation
CN105894815A (en) * 2016-05-27 2016-08-24 苏州市职业大学 Semantic region segmentation-based traffic congestion early warning method
CN106408032A (en) * 2016-09-30 2017-02-15 防城港市港口区高创信息技术有限公司 Fatigue driving detection method based on corner of steering wheel

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100209885A1 (en) * 2009-02-18 2010-08-19 Gm Global Technology Operations, Inc. Vehicle stability enhancement control adaptation to driving skill based on lane change maneuver
DE102014205127A1 (en) * 2013-04-17 2014-10-23 Ford Global Technologies, Llc Control of the driving dynamics of a vehicle with ruts compensation
CN105894815A (en) * 2016-05-27 2016-08-24 苏州市职业大学 Semantic region segmentation-based traffic congestion early warning method
CN106408032A (en) * 2016-09-30 2017-02-15 防城港市港口区高创信息技术有限公司 Fatigue driving detection method based on corner of steering wheel

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378397A (en) * 2019-06-27 2019-10-25 深圳大学 A kind of driving style recognition methods and device
CN111126438A (en) * 2019-11-22 2020-05-08 北京理工大学 Driving behavior recognition method and system
CN111126438B (en) * 2019-11-22 2023-11-14 北京理工大学 Driving behavior recognition method and system
CN113159105A (en) * 2021-02-26 2021-07-23 北京科技大学 Unsupervised driving behavior pattern recognition method and data acquisition monitoring system
CN113159105B (en) * 2021-02-26 2023-08-08 北京科技大学 Driving behavior unsupervised mode identification method and data acquisition monitoring system
CN113239964A (en) * 2021-04-13 2021-08-10 联合汽车电子有限公司 Vehicle data processing method, device, equipment and storage medium
CN113239964B (en) * 2021-04-13 2024-03-01 联合汽车电子有限公司 Method, device, equipment and storage medium for processing vehicle data

Also Published As

Publication number Publication date
CN109086794B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110297988B (en) Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm
CN106650780B (en) Data processing method and device, classifier training method and system
Burns et al. Women also snowboard: Overcoming bias in captioning models
US10936906B2 (en) Training data acquisition method and device, server and storage medium
CN103530540B (en) User identity attribute detection method based on man-machine interaction behavior characteristics
CN110619568A (en) Risk assessment report generation method, device, equipment and storage medium
CN109086794A (en) A kind of driving behavior mode knowledge method based on T-LDA topic model
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN111460221B (en) Comment information processing method and device and electronic equipment
CN105808524A (en) Patent document abstract-based automatic patent classification method
CN105874753A (en) Systems and methods for behavioral segmentation of users in a social data network
Zhang et al. Multimodal marketing intent analysis for effective targeted advertising
CN111553127A (en) Multi-label text data feature selection method and device
CN115131698B (en) Video attribute determining method, device, equipment and storage medium
CN114692593B (en) Network information safety monitoring and early warning method
Fagni et al. Fine-grained prediction of political leaning on social media with unsupervised deep learning
CN111738341A (en) Distributed large-scale face clustering method and device
CN103336830A (en) Image search method based on structure semantic histogram
CN109145140A (en) One kind being based on the matched image search method of hand-drawn outline figure and system
CN106095987A (en) A kind of content personalization method for pushing based on community network and system
CN105354720B (en) A method of mixed recommendation is carried out to consumption place based on visual cluster
CN111831819A (en) Text updating method and device
CN107423294A (en) A kind of community image search method and system
CN106952251B (en) A kind of image significance detection method based on Adsorption Model
CN107818319A (en) A kind of method of automatic discrimination face beauty degree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant