CN107239435B - Travel period detection method based on information entropy - Google Patents
Travel period detection method based on information entropy Download PDFInfo
- Publication number
- CN107239435B CN107239435B CN201710487737.4A CN201710487737A CN107239435B CN 107239435 B CN107239435 B CN 107239435B CN 201710487737 A CN201710487737 A CN 201710487737A CN 107239435 B CN107239435 B CN 107239435B
- Authority
- CN
- China
- Prior art keywords
- matrix
- information entropy
- sequence
- period
- travel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
Abstract
The invention relates to a trip period detection method based on information entropy, which comprises the following steps of S1, marking whether trips occur or not as 1 and 0 respectively, setting the possible period of a trip sequence with a given length of L as P, and storing the trip sequence into oneMatrix M ofP(ii) a S2. for matrix MPFor which a probability is defined:wherein M (j, c) represents a matrix MpRow j, column c; s3, calculating the current matrix MPThe information entropy of (a) is:s4, taking a matrix MPAll columns with the middle probability larger than the set first threshold value are subjected to saturation fPS5.P +1, repeating steps S1-S5 until P is greater than L/2, S6. finding a possible set of periods P ═ { P | the information entropy at P is no higher than the information entropy at P +1 and P-1, and the saturation f for P corresponds top>A second threshold value, where the smallest value in the set P is the period value.
Description
Technical Field
The invention relates to the field of intelligent traffic control, in particular to a travel period detection method based on information entropy.
Background
In the big data age, means for information acquisition are very numerous, and information perception tools are very popular, which makes it possible to collect various data. At the same time, the resulting data product is very rich, including sequences of many events.
In daily life, many trips occur periodically, for example, a company staff king takes a subway in the morning to go to a company for work every week (7 days) (5 days), which is a periodic behavior in two dimensions of time and space; for another example, aunt goes to a (perhaps not the same) supermarket to buy living goods every weekday, which is a periodic behavior in the time dimension.
Whether a period exists in the occurrence of a certain event and what the periodic mode is, the method has important significance for the management of the event and has guiding significance for the improvement and promotion of a corresponding system. For example, urban traffic travel can be predicted according to the commuting travel condition of residents in one area, and targeted improvement and improvement are provided for an urban traffic system.
In traffic systems, travelers are perceived by fixed sensing, detecting devices, such as bayonets, coils, etc. And particularly to a public transport system, the OD point information of each stage of travel of the traveler is acquired.
At present, a common travel trajectory space-time analysis method is to number spatial regions, then pick points on a trajectory according to a certain rule (in fact, there are often points first and then there is a trajectory), and obtain a numbering attribute according to the region to which the points belong. And converting a space-time travel track into a symbol sequence through the steps. Finally, the trajectory is analyzed by a sequence of symbols.
This method has the following disadvantages: firstly, the information is easy to lose and have redundancy due to the excessively large and small space region division. Meanwhile, certain noise exists in the travel track, and the noise is difficult to eliminate under the existing framework of the method. The individual trips have different trip purposes and habits, global periodic detection and periodic pattern recognition are carried out on all trips without distinguishing, and the periodic detection difficulty is greatly increased.
Second, the detailed part of the trajectory is too much of a concern. From a large-scale, global macroscopic perspective, the number of urban trips is enormous. Not to say that the calculation amount is huge, the city manager and the decision maker of the transportation department often pay more attention to the mesoscopic indexes, such as how much commute amount occurs in a certain area, the origin-destination of the commute, and not the travel track to a specific place.
Therefore, the invention adopts a detection method which abstracts the travel track sequence of a traveler into a 01 sequence under a certain time window based on the mesoscopic level. The method is the basis of the period detection method provided by the invention.
Firstly, merging the travel stages of the same trip purpose (such as merging the transfer trips), and converting the individual trip information into an individual trip chain based on the trip purpose. And carrying out pattern clustering and partitioning on a trip chain of a traveler (the pattern can be regarded as a type of purpose trip, such as queen work, and is related to the partitioning standard of a specific cluster). Within the observation time window, the trip occurrence of the pattern is marked as 1 on a certain day, otherwise, the pattern is marked as 0. The method well meets the requirements of city managers and traffic decision makers from the viewpoint of view.
Thus, a travel 01 sequence in different modes for each traveler is obtained.
In the aspect of period detection, at present, the main emphasis is placed on the period and period mode of a detection time sequence, a symbol sequence and a transaction sequence at home and abroad, and a method for detecting a 01 sequence period is less. Now, a general periodic detection method suitable for the 01 sequence is introduced:
the method comprises the following steps: autocorrelation functions and fast fourier transforms.
The method comprises the steps of firstly calculating an autocorrelation function of a sequence, and obtaining a main frequency by performing fast Fourier transform on the autocorrelation function of the sequence on the basis of the principle that the autocorrelation function of a periodic signal is a periodic function and the period is the same as the periodic signal, wherein the period is the reciprocal of the calculated main frequency.
The second method comprises the following steps: in the biological kingdom, biologists consider the repeated appearance of genes on DNA to be of great significance to genetic shape. Therefore, in the gene field, the presence or absence of a gene in DNA is marked as 1 and 0, and a method based on information theory is proposed to detect the 01 sequence cycle. In particular, such sequences are exceptionally sparse sequences, with a ratio of 0 to 1 on the order of 1000.
The method is the most commonly used method for detecting 01 sequence (signal) periods at present, but the method cannot meet the requirement. This method treats the sequence as an indivisible flow, making it impossible to obtain a periodic pattern of sequences. Furthermore, for some sequences, the distinction between the dominant and the secondary dominant frequencies of the autocorrelation function is not obvious, and the adaptability needs to be enhanced: consider the sequence "0, 0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0, 1" with a period of 7 and a periodic pattern of 0,0,1,0,0,0, 1. Under this method, the autocorrelation function and FFT spectrum image of the sequence are shown in fig. 1 and 2.
Obviously, under the detection of this method, the detected period is 3.5, which does not match 7.
The second method has strong robustness for 01 sequence detection, but the attention is paid to extremely sparse sequences, and the second method is a precondition for the applicability of the second method.
Disclosure of Invention
The invention provides a travel period detection method based on information entropy, aiming at solving the defect that the travel period cannot be effectively and accurately detected by the period detection method provided by the prior art.
In order to realize the purpose, the technical scheme is as follows:
a travel period detection method based on information entropy comprises the following steps:
s1, marking whether the travel occurs or not as 1 and 0 respectively, and storing the travel sequence into one by setting the possible period of the travel sequence to be P for the travel sequence with the given length of LMatrix M ofP;
S2. for matrix MPFor which a probability is defined:
wherein M (j, c) represents a matrix MpRow j, column c;
s4, taking a matrix MPThe middle probability is greater than the setSaturation f for all columns of a thresholdPCalculating (1);
s5.P +1, repeating steps S1-S5 until P is greater than L/2;
s6, solving the possible period set P ═ { P | information entropy at P is not higher than information entropy at P +1 and P-1, and saturation f corresponding to P is obtainedp>A second threshold value, where the smallest value in the set P is the period value.
Compared with the prior art, the invention has the beneficial effects that:
the invention converts the travel information into 01 sequences from the viewpoint of observation. And based on the knowledge of the information theory, a travel period detection method based on the information entropy is provided, the method can effectively detect the travel period and the travel period mode, and has strong adaptability to random noise.
Drawings
FIG. 1 is a graph of the correlation function of sequences.
Fig. 2 is a graph of the FFT spectrum of a sequence.
Fig. 3 is a flow chart of a method.
Fig. 4 is a diagram of the matrixing format of the sequence S when P is 7.
Fig. 5 is a diagram of the matrixing format of the sequence S when P is 8.
Fig. 6 is a graph showing the degree of overlap in the longitudinal direction for the sequence S when P is 7.
Fig. 7 is a graph showing the degree of overlap in the longitudinal direction for the sequence S when P is 8.
FIG. 8 is a schematic representation of the probability of each column of the matrix.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Example 1
FIG. 3 is a flow chart of the method of the present invention, as shown in FIG. 3, given a travel sequence S of length L, with possible periods of S being P, the sequence is stored as oneMatrix M ofPAnd the remaining part is complemented by 0. For example, the sequence "0, 0,1,0,0,0,1,0,0,1,0,0,0, 1" is represented in fig. 4 in a matrixed form at P ═ 7, and represented in fig. 5 in a matrixed form at P ═ 8.
If P is 7, the period of S, as shown in fig. 6, the overlap ratio of 0/1 distribution in the longitudinal direction is high. If P is 8, the period is not S, and as shown in fig. 7, the 0/1 distribution has a poor overlap ratio in the longitudinal direction.
In order to measure the 'contact ratio', the invention introduces the step of information entropy judgment.
The information entropy is used for quantifying the uncertainty and the information quantity. Entropy is the amount of information that, on average, results from the occurrence of an event. Mathematically, the entropy of the information is really a desire for the amount of information.
The definition of the information entropy is:
H=-∑P(x)logP(x)
for example, "99% rains in the daytime", and "hardly rains in the daytime" are comparatively determined, and the amount of information is very low.
H=-(0.99log0.99+0.01log0.01)
=0.024
However, the uncertainty of "25% rainy in sunny days, 25% sunny days, and 50% cloudy" is large, and has a large information amount.
H=-(0.5log0.5+0.25log0.25+0.25log0.25)
=0.45
Thus defining a probability for each column c of the matrix MP
c is 1,2, … … P, as shown in fig. 8.
When pc is close to 1 or close to 0, the certainty of whether the travel event of the column occurs is high, and the information amount is low. This is only true if the value of P is a period value or an integer multiple thereof. Therefore, the entropy is used to measure the certainty of the sequence S to be detected at the current possible period value.
Information entropy defining the matrix MP:
then taking the matrix MPAll columns with the middle probability larger than the set first threshold value are subjected to saturation fPUntil P is greater than L/2, finding the possible period set P { P | has no higher entropy than P +1 and P-1, and P corresponds to the saturation fp>A second threshold value, where the smallest value in the set P is the period value.
The specific process of calculating the saturation is as follows:
fPtotal number of events contained in the row event/extracted column
As shown in fig. 8, when Pm is 7, the saturation f of the matrix isP0.875, and a periodic pattern of 3, 7.
Example 2
This example tests the method of example 1 by artificially creating a sequence of periods of length 112. Wherein m is the period value of the original sequence, and n is the number of trip events occurring in one period in the original sequence.
And random noise is added-01 exchange-definition noise ratio η:
where l is the length of the sequence and Noise is the number of 0/1 swaps occurring in the sequence, i.e. the degree of Noise added is related to the number of travel event occurrences in the original sequence.
The experimental results are shown below:
the results show that the method provided by the invention has certain adaptability to noise. More than 80% of the noise can still be represented in 25%; even some of the test samples maintained 100% accuracy in 25% noise. The advantages thereof are self-evident.
At the same time, the results suggest: in some sequences with odd periods, the method performs better than sequences with even periods, which means that the method can perform better when detecting a particular period.
For example: an attempt is made to detect whether the period of a sequence is 6. Given that odd numbers have a higher fitness than even numbers, 0's may be inserted at every 6 positions of the sequence. If the sequence itself has a period of 6, the detected period should be 7. Through experiments, the sequence with the noise ratio of 25%, m ═ 6, and n ═ 4 is processed by the method, and the accuracy of detection is improved from 0.862 to 0.996.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (2)
1. A travel period detection method based on information entropy is characterized in that: the method comprises the following steps:
s1, marking whether the travel occurs or not as 1 and 0 respectively, and storing the travel sequence into one by setting the possible period of the travel sequence to be P for the travel sequence with the given length of LMatrix M ofP;
S2. for matrix MPFor which a probability is defined:
wherein M (j, c) represents a matrix MpRow j, column c;
s4, taking a matrix MPAll columns with the middle probability larger than the set first threshold value are subjected to saturation fPCalculating (1);
s5.P +1, repeating steps S1-S5 until P is greater than L/2;
s6, solving the possible period set P ═ { P | information entropy at P is not higher than information entropy at P +1 and P-1, and saturation f corresponding to P is obtainedp>A second threshold value, where the smallest value in the set P is the period value.
2. The information entropy-based travel cycle detection method according to claim 1, wherein: the specific process of the saturation calculation in step S4 is as follows:
fPthe total number of events contained in the row event/extracted column.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710487737.4A CN107239435B (en) | 2017-06-23 | 2017-06-23 | Travel period detection method based on information entropy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710487737.4A CN107239435B (en) | 2017-06-23 | 2017-06-23 | Travel period detection method based on information entropy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107239435A CN107239435A (en) | 2017-10-10 |
CN107239435B true CN107239435B (en) | 2020-07-14 |
Family
ID=59987319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710487737.4A Active CN107239435B (en) | 2017-06-23 | 2017-06-23 | Travel period detection method based on information entropy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239435B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681741B (en) * | 2018-04-08 | 2021-11-12 | 东南大学 | Subway commuting crowd information fusion method based on IC card and resident survey data |
CN109471887A (en) * | 2018-10-25 | 2019-03-15 | 电子科技大学中山学院 | Relative entropy-based period acquisition method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866143A (en) * | 2009-04-14 | 2010-10-20 | 北京宏德信智源信息技术有限公司 | Road traffic service level prediction method based on space-time characteristic aggregation |
CN103646187A (en) * | 2013-12-27 | 2014-03-19 | 中国科学院自动化研究所 | Method for obtaining vehicle travel path and OD (Origin-Destination) matrix in statistic period |
CN103793599A (en) * | 2014-01-17 | 2014-05-14 | 浙江远图智控系统有限公司 | Travel anomaly detection method based on hidden Markov model |
US9047767B2 (en) * | 2013-09-09 | 2015-06-02 | International Business Machines Corporation | Traffic impact prediction for multiple event planning |
CN104766475A (en) * | 2015-04-09 | 2015-07-08 | 银江股份有限公司 | Urban traffic bottleneck mining method |
-
2017
- 2017-06-23 CN CN201710487737.4A patent/CN107239435B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866143A (en) * | 2009-04-14 | 2010-10-20 | 北京宏德信智源信息技术有限公司 | Road traffic service level prediction method based on space-time characteristic aggregation |
US9047767B2 (en) * | 2013-09-09 | 2015-06-02 | International Business Machines Corporation | Traffic impact prediction for multiple event planning |
CN103646187A (en) * | 2013-12-27 | 2014-03-19 | 中国科学院自动化研究所 | Method for obtaining vehicle travel path and OD (Origin-Destination) matrix in statistic period |
CN103793599A (en) * | 2014-01-17 | 2014-05-14 | 浙江远图智控系统有限公司 | Travel anomaly detection method based on hidden Markov model |
CN104766475A (en) * | 2015-04-09 | 2015-07-08 | 银江股份有限公司 | Urban traffic bottleneck mining method |
Non-Patent Citations (1)
Title |
---|
考虑出行模式和周期性的公交出行特征分析;何兆成 等;《交通运输系统工程与信息》;20161230;第16卷(第6期);第135-141页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107239435A (en) | 2017-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245981B (en) | Crowd type identification method based on mobile phone signaling data | |
Yu et al. | Prediction of bus travel time using random forests based on near neighbors | |
CN106874432B (en) | A kind of public transport passenger trip space-time trajectory extracting method | |
KR101638368B1 (en) | Prediction System And Method of Urban Traffic Flow Using Multifactor Pattern Recognition Model | |
CN108415975B (en) | BDCH-DBSCAN-based taxi passenger carrying hot spot identification method | |
CN110738856B (en) | Mobile clustering-based urban traffic jam fine identification method | |
CN105513370B (en) | The traffic zone division methods excavated based on sparse license plate identification data | |
CN108062857B (en) | Prediction technique for cab-getter's trip purpose | |
CN110836675B (en) | Decision tree-based automatic driving search decision method | |
CN108122186B (en) | Job and live position estimation method based on checkpoint data | |
Zhu et al. | Inferring taxi status using gps trajectories | |
CN105374209A (en) | Urban region road network running state characteristic information extraction method | |
CN107239435B (en) | Travel period detection method based on information entropy | |
CN114428828A (en) | Method and device for digging new road based on driving track and electronic equipment | |
CN113763712B (en) | Regional traffic jam tracing method based on travel event knowledge graph | |
CN110929939A (en) | Landslide hazard susceptibility spatial prediction method based on clustering-information coupling model | |
CN103093625A (en) | City road traffic condition real-time estimation method based on reliability verification | |
CN108257385A (en) | A kind of discriminating method of the anomalous event based on public transport | |
Lawson et al. | Compression and mining of GPS trace data: new techniques and applications | |
CN103902848A (en) | System and method for identifying drug targets based on drug interaction similarities | |
CN110716925A (en) | Cross-border behavior recognition method based on trajectory analysis | |
CN108053646B (en) | Traffic characteristic obtaining method, traffic characteristic prediction method and traffic characteristic prediction system based on time sensitive characteristics | |
CN112052405B (en) | Passenger searching area recommendation method based on driver experience | |
CN109740957A (en) | A kind of urban traffic network node-classification method | |
Gambs et al. | Towards temporal mobility markov chains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |