CN107239435B - Travel period detection method based on information entropy - Google Patents

Travel period detection method based on information entropy Download PDF

Info

Publication number
CN107239435B
CN107239435B CN201710487737.4A CN201710487737A CN107239435B CN 107239435 B CN107239435 B CN 107239435B CN 201710487737 A CN201710487737 A CN 201710487737A CN 107239435 B CN107239435 B CN 107239435B
Authority
CN
China
Prior art keywords
matrix
information entropy
sequence
period
travel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710487737.4A
Other languages
Chinese (zh)
Other versions
CN107239435A (en
Inventor
何兆成
邓紫坤
余畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201710487737.4A priority Critical patent/CN107239435B/en
Publication of CN107239435A publication Critical patent/CN107239435A/en
Application granted granted Critical
Publication of CN107239435B publication Critical patent/CN107239435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles

Abstract

The invention relates to a trip period detection method based on information entropy, which comprises the following steps of S1, marking whether trips occur or not as 1 and 0 respectively, setting the possible period of a trip sequence with a given length of L as P, and storing the trip sequence into one
Figure DDA0002403142160000013
Matrix M ofP(ii) a S2. for matrix MPFor which a probability is defined:
Figure DDA0002403142160000011
wherein M (j, c) represents a matrix MpRow j, column c; s3, calculating the current matrix MPThe information entropy of (a) is:
Figure DDA0002403142160000012
s4, taking a matrix MPAll columns with the middle probability larger than the set first threshold value are subjected to saturation fPS5.P +1, repeating steps S1-S5 until P is greater than L/2, S6. finding a possible set of periods P ═ { P | the information entropy at P is no higher than the information entropy at P +1 and P-1, and the saturation f for P corresponds top>A second threshold value, where the smallest value in the set P is the period value.

Description

Travel period detection method based on information entropy
Technical Field
The invention relates to the field of intelligent traffic control, in particular to a travel period detection method based on information entropy.
Background
In the big data age, means for information acquisition are very numerous, and information perception tools are very popular, which makes it possible to collect various data. At the same time, the resulting data product is very rich, including sequences of many events.
In daily life, many trips occur periodically, for example, a company staff king takes a subway in the morning to go to a company for work every week (7 days) (5 days), which is a periodic behavior in two dimensions of time and space; for another example, aunt goes to a (perhaps not the same) supermarket to buy living goods every weekday, which is a periodic behavior in the time dimension.
Whether a period exists in the occurrence of a certain event and what the periodic mode is, the method has important significance for the management of the event and has guiding significance for the improvement and promotion of a corresponding system. For example, urban traffic travel can be predicted according to the commuting travel condition of residents in one area, and targeted improvement and improvement are provided for an urban traffic system.
In traffic systems, travelers are perceived by fixed sensing, detecting devices, such as bayonets, coils, etc. And particularly to a public transport system, the OD point information of each stage of travel of the traveler is acquired.
At present, a common travel trajectory space-time analysis method is to number spatial regions, then pick points on a trajectory according to a certain rule (in fact, there are often points first and then there is a trajectory), and obtain a numbering attribute according to the region to which the points belong. And converting a space-time travel track into a symbol sequence through the steps. Finally, the trajectory is analyzed by a sequence of symbols.
This method has the following disadvantages: firstly, the information is easy to lose and have redundancy due to the excessively large and small space region division. Meanwhile, certain noise exists in the travel track, and the noise is difficult to eliminate under the existing framework of the method. The individual trips have different trip purposes and habits, global periodic detection and periodic pattern recognition are carried out on all trips without distinguishing, and the periodic detection difficulty is greatly increased.
Second, the detailed part of the trajectory is too much of a concern. From a large-scale, global macroscopic perspective, the number of urban trips is enormous. Not to say that the calculation amount is huge, the city manager and the decision maker of the transportation department often pay more attention to the mesoscopic indexes, such as how much commute amount occurs in a certain area, the origin-destination of the commute, and not the travel track to a specific place.
Therefore, the invention adopts a detection method which abstracts the travel track sequence of a traveler into a 01 sequence under a certain time window based on the mesoscopic level. The method is the basis of the period detection method provided by the invention.
Firstly, merging the travel stages of the same trip purpose (such as merging the transfer trips), and converting the individual trip information into an individual trip chain based on the trip purpose. And carrying out pattern clustering and partitioning on a trip chain of a traveler (the pattern can be regarded as a type of purpose trip, such as queen work, and is related to the partitioning standard of a specific cluster). Within the observation time window, the trip occurrence of the pattern is marked as 1 on a certain day, otherwise, the pattern is marked as 0. The method well meets the requirements of city managers and traffic decision makers from the viewpoint of view.
Thus, a travel 01 sequence in different modes for each traveler is obtained.
In the aspect of period detection, at present, the main emphasis is placed on the period and period mode of a detection time sequence, a symbol sequence and a transaction sequence at home and abroad, and a method for detecting a 01 sequence period is less. Now, a general periodic detection method suitable for the 01 sequence is introduced:
the method comprises the following steps: autocorrelation functions and fast fourier transforms.
The method comprises the steps of firstly calculating an autocorrelation function of a sequence, and obtaining a main frequency by performing fast Fourier transform on the autocorrelation function of the sequence on the basis of the principle that the autocorrelation function of a periodic signal is a periodic function and the period is the same as the periodic signal, wherein the period is the reciprocal of the calculated main frequency.
The second method comprises the following steps: in the biological kingdom, biologists consider the repeated appearance of genes on DNA to be of great significance to genetic shape. Therefore, in the gene field, the presence or absence of a gene in DNA is marked as 1 and 0, and a method based on information theory is proposed to detect the 01 sequence cycle. In particular, such sequences are exceptionally sparse sequences, with a ratio of 0 to 1 on the order of 1000.
The method is the most commonly used method for detecting 01 sequence (signal) periods at present, but the method cannot meet the requirement. This method treats the sequence as an indivisible flow, making it impossible to obtain a periodic pattern of sequences. Furthermore, for some sequences, the distinction between the dominant and the secondary dominant frequencies of the autocorrelation function is not obvious, and the adaptability needs to be enhanced: consider the sequence "0, 0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0, 1" with a period of 7 and a periodic pattern of 0,0,1,0,0,0, 1. Under this method, the autocorrelation function and FFT spectrum image of the sequence are shown in fig. 1 and 2.
Obviously, under the detection of this method, the detected period is 3.5, which does not match 7.
The second method has strong robustness for 01 sequence detection, but the attention is paid to extremely sparse sequences, and the second method is a precondition for the applicability of the second method.
Disclosure of Invention
The invention provides a travel period detection method based on information entropy, aiming at solving the defect that the travel period cannot be effectively and accurately detected by the period detection method provided by the prior art.
In order to realize the purpose, the technical scheme is as follows:
a travel period detection method based on information entropy comprises the following steps:
s1, marking whether the travel occurs or not as 1 and 0 respectively, and storing the travel sequence into one by setting the possible period of the travel sequence to be P for the travel sequence with the given length of L
Figure GDA0002403142150000033
Matrix M ofP
S2. for matrix MPFor which a probability is defined:
Figure GDA0002403142150000031
wherein M (j, c) represents a matrix MpRow j, column c;
s3, calculating the current matrix MPThe information entropy of (a) is:
Figure GDA0002403142150000032
s4, taking a matrix MPThe middle probability is greater than the setSaturation f for all columns of a thresholdPCalculating (1);
s5.P +1, repeating steps S1-S5 until P is greater than L/2;
s6, solving the possible period set P ═ { P | information entropy at P is not higher than information entropy at P +1 and P-1, and saturation f corresponding to P is obtainedp>A second threshold value, where the smallest value in the set P is the period value.
Compared with the prior art, the invention has the beneficial effects that:
the invention converts the travel information into 01 sequences from the viewpoint of observation. And based on the knowledge of the information theory, a travel period detection method based on the information entropy is provided, the method can effectively detect the travel period and the travel period mode, and has strong adaptability to random noise.
Drawings
FIG. 1 is a graph of the correlation function of sequences.
Fig. 2 is a graph of the FFT spectrum of a sequence.
Fig. 3 is a flow chart of a method.
Fig. 4 is a diagram of the matrixing format of the sequence S when P is 7.
Fig. 5 is a diagram of the matrixing format of the sequence S when P is 8.
Fig. 6 is a graph showing the degree of overlap in the longitudinal direction for the sequence S when P is 7.
Fig. 7 is a graph showing the degree of overlap in the longitudinal direction for the sequence S when P is 8.
FIG. 8 is a schematic representation of the probability of each column of the matrix.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Example 1
FIG. 3 is a flow chart of the method of the present invention, as shown in FIG. 3, given a travel sequence S of length L, with possible periods of S being P, the sequence is stored as one
Figure GDA0002403142150000042
Matrix M ofPAnd the remaining part is complemented by 0. For example, the sequence "0, 0,1,0,0,0,1,0,0,1,0,0,0, 1" is represented in fig. 4 in a matrixed form at P ═ 7, and represented in fig. 5 in a matrixed form at P ═ 8.
If P is 7, the period of S, as shown in fig. 6, the overlap ratio of 0/1 distribution in the longitudinal direction is high. If P is 8, the period is not S, and as shown in fig. 7, the 0/1 distribution has a poor overlap ratio in the longitudinal direction.
In order to measure the 'contact ratio', the invention introduces the step of information entropy judgment.
The information entropy is used for quantifying the uncertainty and the information quantity. Entropy is the amount of information that, on average, results from the occurrence of an event. Mathematically, the entropy of the information is really a desire for the amount of information.
The definition of the information entropy is:
H=-∑P(x)logP(x)
for example, "99% rains in the daytime", and "hardly rains in the daytime" are comparatively determined, and the amount of information is very low.
H=-(0.99log0.99+0.01log0.01)
=0.024
However, the uncertainty of "25% rainy in sunny days, 25% sunny days, and 50% cloudy" is large, and has a large information amount.
H=-(0.5log0.5+0.25log0.25+0.25log0.25)
=0.45
Thus defining a probability for each column c of the matrix MP
Figure GDA0002403142150000041
c is 1,2, … … P, as shown in fig. 8.
When pc is close to 1 or close to 0, the certainty of whether the travel event of the column occurs is high, and the information amount is low. This is only true if the value of P is a period value or an integer multiple thereof. Therefore, the entropy is used to measure the certainty of the sequence S to be detected at the current possible period value.
Information entropy defining the matrix MP:
Figure GDA0002403142150000051
then taking the matrix MPAll columns with the middle probability larger than the set first threshold value are subjected to saturation fPUntil P is greater than L/2, finding the possible period set P { P | has no higher entropy than P +1 and P-1, and P corresponds to the saturation fp>A second threshold value, where the smallest value in the set P is the period value.
The specific process of calculating the saturation is as follows:
fPtotal number of events contained in the row event/extracted column
As shown in fig. 8, when Pm is 7, the saturation f of the matrix isP0.875, and a periodic pattern of 3, 7.
Example 2
This example tests the method of example 1 by artificially creating a sequence of periods of length 112. Wherein m is the period value of the original sequence, and n is the number of trip events occurring in one period in the original sequence.
And random noise is added-01 exchange-definition noise ratio η:
Figure GDA0002403142150000052
where l is the length of the sequence and Noise is the number of 0/1 swaps occurring in the sequence, i.e. the degree of Noise added is related to the number of travel event occurrences in the original sequence.
The experimental results are shown below:
Figure GDA0002403142150000061
the results show that the method provided by the invention has certain adaptability to noise. More than 80% of the noise can still be represented in 25%; even some of the test samples maintained 100% accuracy in 25% noise. The advantages thereof are self-evident.
At the same time, the results suggest: in some sequences with odd periods, the method performs better than sequences with even periods, which means that the method can perform better when detecting a particular period.
For example: an attempt is made to detect whether the period of a sequence is 6. Given that odd numbers have a higher fitness than even numbers, 0's may be inserted at every 6 positions of the sequence. If the sequence itself has a period of 6, the detected period should be 7. Through experiments, the sequence with the noise ratio of 25%, m ═ 6, and n ═ 4 is processed by the method, and the accuracy of detection is improved from 0.862 to 0.996.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (2)

1. A travel period detection method based on information entropy is characterized in that: the method comprises the following steps:
s1, marking whether the travel occurs or not as 1 and 0 respectively, and storing the travel sequence into one by setting the possible period of the travel sequence to be P for the travel sequence with the given length of L
Figure FDA0002403142140000013
Matrix M ofP
S2. for matrix MPFor which a probability is defined:
Figure FDA0002403142140000011
wherein M (j, c) represents a matrix MpRow j, column c;
s3, calculating the current matrix MPThe information entropy of (a) is:
Figure FDA0002403142140000012
s4, taking a matrix MPAll columns with the middle probability larger than the set first threshold value are subjected to saturation fPCalculating (1);
s5.P +1, repeating steps S1-S5 until P is greater than L/2;
s6, solving the possible period set P ═ { P | information entropy at P is not higher than information entropy at P +1 and P-1, and saturation f corresponding to P is obtainedp>A second threshold value, where the smallest value in the set P is the period value.
2. The information entropy-based travel cycle detection method according to claim 1, wherein: the specific process of the saturation calculation in step S4 is as follows:
fPthe total number of events contained in the row event/extracted column.
CN201710487737.4A 2017-06-23 2017-06-23 Travel period detection method based on information entropy Active CN107239435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710487737.4A CN107239435B (en) 2017-06-23 2017-06-23 Travel period detection method based on information entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710487737.4A CN107239435B (en) 2017-06-23 2017-06-23 Travel period detection method based on information entropy

Publications (2)

Publication Number Publication Date
CN107239435A CN107239435A (en) 2017-10-10
CN107239435B true CN107239435B (en) 2020-07-14

Family

ID=59987319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710487737.4A Active CN107239435B (en) 2017-06-23 2017-06-23 Travel period detection method based on information entropy

Country Status (1)

Country Link
CN (1) CN107239435B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681741B (en) * 2018-04-08 2021-11-12 东南大学 Subway commuting crowd information fusion method based on IC card and resident survey data
CN109471887A (en) * 2018-10-25 2019-03-15 电子科技大学中山学院 Relative entropy-based period acquisition method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866143A (en) * 2009-04-14 2010-10-20 北京宏德信智源信息技术有限公司 Road traffic service level prediction method based on space-time characteristic aggregation
CN103646187A (en) * 2013-12-27 2014-03-19 中国科学院自动化研究所 Method for obtaining vehicle travel path and OD (Origin-Destination) matrix in statistic period
CN103793599A (en) * 2014-01-17 2014-05-14 浙江远图智控系统有限公司 Travel anomaly detection method based on hidden Markov model
US9047767B2 (en) * 2013-09-09 2015-06-02 International Business Machines Corporation Traffic impact prediction for multiple event planning
CN104766475A (en) * 2015-04-09 2015-07-08 银江股份有限公司 Urban traffic bottleneck mining method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866143A (en) * 2009-04-14 2010-10-20 北京宏德信智源信息技术有限公司 Road traffic service level prediction method based on space-time characteristic aggregation
US9047767B2 (en) * 2013-09-09 2015-06-02 International Business Machines Corporation Traffic impact prediction for multiple event planning
CN103646187A (en) * 2013-12-27 2014-03-19 中国科学院自动化研究所 Method for obtaining vehicle travel path and OD (Origin-Destination) matrix in statistic period
CN103793599A (en) * 2014-01-17 2014-05-14 浙江远图智控系统有限公司 Travel anomaly detection method based on hidden Markov model
CN104766475A (en) * 2015-04-09 2015-07-08 银江股份有限公司 Urban traffic bottleneck mining method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
考虑出行模式和周期性的公交出行特征分析;何兆成 等;《交通运输系统工程与信息》;20161230;第16卷(第6期);第135-141页 *

Also Published As

Publication number Publication date
CN107239435A (en) 2017-10-10

Similar Documents

Publication Publication Date Title
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
Yu et al. Prediction of bus travel time using random forests based on near neighbors
CN106874432B (en) A kind of public transport passenger trip space-time trajectory extracting method
KR101638368B1 (en) Prediction System And Method of Urban Traffic Flow Using Multifactor Pattern Recognition Model
CN108415975B (en) BDCH-DBSCAN-based taxi passenger carrying hot spot identification method
CN110738856B (en) Mobile clustering-based urban traffic jam fine identification method
CN105513370B (en) The traffic zone division methods excavated based on sparse license plate identification data
CN108062857B (en) Prediction technique for cab-getter's trip purpose
CN110836675B (en) Decision tree-based automatic driving search decision method
CN108122186B (en) Job and live position estimation method based on checkpoint data
Zhu et al. Inferring taxi status using gps trajectories
CN105374209A (en) Urban region road network running state characteristic information extraction method
CN107239435B (en) Travel period detection method based on information entropy
CN114428828A (en) Method and device for digging new road based on driving track and electronic equipment
CN113763712B (en) Regional traffic jam tracing method based on travel event knowledge graph
CN110929939A (en) Landslide hazard susceptibility spatial prediction method based on clustering-information coupling model
CN103093625A (en) City road traffic condition real-time estimation method based on reliability verification
CN108257385A (en) A kind of discriminating method of the anomalous event based on public transport
Lawson et al. Compression and mining of GPS trace data: new techniques and applications
CN103902848A (en) System and method for identifying drug targets based on drug interaction similarities
CN110716925A (en) Cross-border behavior recognition method based on trajectory analysis
CN108053646B (en) Traffic characteristic obtaining method, traffic characteristic prediction method and traffic characteristic prediction system based on time sensitive characteristics
CN112052405B (en) Passenger searching area recommendation method based on driver experience
CN109740957A (en) A kind of urban traffic network node-classification method
Gambs et al. Towards temporal mobility markov chains

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant