CN116017407A - Method for reliably identifying resident trip mode driven by mobile phone signaling data - Google Patents

Method for reliably identifying resident trip mode driven by mobile phone signaling data Download PDF

Info

Publication number
CN116017407A
CN116017407A CN202211616958.4A CN202211616958A CN116017407A CN 116017407 A CN116017407 A CN 116017407A CN 202211616958 A CN202211616958 A CN 202211616958A CN 116017407 A CN116017407 A CN 116017407A
Authority
CN
China
Prior art keywords
data
travel
mobile phone
resident
signaling data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211616958.4A
Other languages
Chinese (zh)
Inventor
彩晨
刘欢
陆振波
贺洋
何静
刘娟
安成川
夏井新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211616958.4A priority Critical patent/CN116017407A/en
Publication of CN116017407A publication Critical patent/CN116017407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a reliable identification method for resident trip modes driven by mobile phone signaling data, which comprises the following steps: acquiring and preprocessing original data of mobile phone signaling; acquiring resident trip investigation data, and matching the resident trip investigation data with mobile phone signaling data to acquire mobile phone signaling data with trip mode labels; extracting resident travel characteristics and carrying out correlation analysis by combining travel mode labels; constructing a Bayesian network structure based on an information theory and a Bayesian network structure based on a probability theory, and constructing a fused Bayesian network model; and processing the continuous trip characteristics into ordered discrete states, completing the node parameter learning of the Bayesian network model, and completing the resident trip mode reliable identification model construction based on the Bayesian network model. According to the invention, the Bayesian network is respectively constructed from the angles of the information theory and the probability theory, the hidden relation among travel mode features is considered, the BIC function is used for carrying out network scoring, and the identification of the travel mode has higher reliability and precision.

Description

Method for reliably identifying resident trip mode driven by mobile phone signaling data
Technical Field
The invention relates to the technical field of travel mode identification, in particular to a reliable identification method for resident travel modes driven by mobile phone signaling data.
Background
In recent years, along with the rapid development of the economic society and the continuous improvement of the living standard of residents in China, the consumption concept, the consumption content and the consumption level are continuously updated, and the travel demands, the travel modes and the resident travel structures of people are deeply changed. The continuous increase of the number of motor vehicles and the rising of novel green travel modes affect the urban multi-mode traffic travel structure, and provide new challenges for urban traffic travel mode identification. The resident travel mode structure data is used as an important representation of urban multi-mode traffic travel demands, and has basic functions of adjusting and optimizing travel mode structures, constructing a traffic planning auxiliary decision platform, relieving traffic jams, popularizing and implementing traffic emission reduction policies and the like. The method is limited by travel characteristic changes caused by low sample characteristics of the collected data of the conventional resident travel behavior investigation, rapid changes of urban space structures and the like, and the resident travel division method based on the conventional four-stage traffic planning theory method is difficult to accurately and reliably estimate the multi-mode travel demands and structures of the decision application.
The existing trip mode identification method mainly comprises the following steps: (1) And determining rules through logic features based on the rule model, and judging trip feature values. Rules and thresholds are typically set according to the expertise and experience of the researcher; (2) And (3) aggregating key features of the mobile phone signaling data by an unsupervised clustering method, and classifying the research samples. Grouping unlabeled data, and analyzing the characteristics of each cluster by using priori knowledge or combining other data sources to artificially judge the travel mode corresponding to the cluster; (3) Based on an analysis method of an activity theory, on the basis of analysis of individual activity travel in a time dimension set, factors such as space-time constraint, family structure, family background, personal attribute and the like are considered, and mode selection behaviors in individual activity-travel decisions are modeled; (4) The statistical analysis model is divided into a plurality of models according to whether causal relation exists among the characteristic variables. The definite causal relationship characteristics can be usually researched by using models such as logic regression, tree structures, neural networks and the like; if the causal relationship is not clear, the research is usually performed by adopting methods such as independence analysis and correlation analysis.
The existing trip mode identification has the following defects: (1) The resident trip mode identification method driven by GPS investigation data in the existing research does not consider the problem of data missing, and has higher requirement on data precision. The GPS investigation data can only extract travel movement track characteristics of a small number of sample groups, the data skewness characteristics are obvious, and the excavated travel rule also has no universality and representativeness. (2) The machine learning method adopted in the existing research cannot effectively reveal the interaction relation between the influencing factors and the result variables, and has unexplainability. (3) The quantitative analysis of the traffic travel data in the existing research does not consider the uncertainty of the traffic travel characteristics of various travel modes. Travel characteristics are not only related to inherent attributes of vehicles, but also may exhibit time variability under the influence of traffic conditions. (4) In the existing research, rules are set through logic features based on a rule model, and threshold selection has high subjectivity. (5) The existing research has the problems of lack of individual activity-trip decision behavior heterogeneity characterization, lack of trip mode selection behavior causal mechanism disclosure and the like.
Therefore, a traffic research data base and a reliable identification method of a traffic trip mode which are more in line with the time features are needed at present, so that the reliable grasp of the structure of the urban traffic trip mode is realized, and effective support is provided for traffic planning, management and decision making of related departments. The problems of high data acquisition cost, limited data samples and the like of traditional manual questionnaire survey data and mobile phone GPS positioning data used by existing travel mode identification research institutes are faced with increasingly abundant transportation travel modes. The development of the mobile communication technology provides a low-cost and large-sample data source for traffic travel mode identification research. The existing research of identifying travel modes by using mobile phone signaling data mostly adopts deterministic models such as rule-based models, machine learning models and statistical analysis models, and has the problems of insufficient uncertainty consideration of traffic travel characteristics, insufficient utilization of incomplete mobile phone signaling data and the like.
In view of the above, there is a need to provide a new approach in an attempt to solve at least some of the above problems.
Disclosure of Invention
Aiming at one or more problems in the prior art, the invention provides a reliable identification method for resident travel modes driven by mobile phone signaling data, an effective extraction method for resident travel characteristics is designed from the time specificity and uncertainty angles of the traffic travel characteristics based on the mobile phone signaling data, a Bayesian network model framework is designed for the reliable identification problem of the resident travel modes, and a reliable identification model for the resident travel modes taking the uncertainty of the traffic travel characteristics into consideration is finally constructed through quantitative characterization of the time specificity of the traffic travel characteristics. The invention can grasp the urban traffic travel mode structure in a new period for related departments, formulate the traffic travel mode structure optimization policy and promote the development of green traffic and low-carbon traffic to provide effective basic data support.
The technical solution for realizing the purpose of the invention is as follows:
a method for reliably identifying resident trip modes driven by mobile phone signaling data comprises the following steps:
s1, acquiring original data of mobile phone signaling, and preprocessing the original data of the mobile phone signaling to obtain preprocessed data of the mobile phone signaling;
s2, acquiring resident trip investigation data, wherein each resident trip investigation data comprises departure time, arrival time, departure place and arrival place, the departure place and the arrival place are represented by traffic cell numbers, and the mobile phone signaling data and the resident trip investigation data are subjected to feature matching to obtain mobile phone signaling data with trip mode labels;
s3, dividing the area to be identified into a plurality of traffic cells, extracting resident travel characteristics based on time specificity and uncertainty of the travel characteristics, wherein the resident travel characteristics comprise traffic environment characteristics, individual characteristics of travelers and travel behavior characteristics, carrying out correlation analysis on the resident travel characteristics by combining travel mode labels, and quantifying correlation among the travel characteristics by mathematical indexes;
s4, designing a Bayesian network model framework for reliably identifying resident trip modes: judging causal links of resident travel characteristics according to correlations among resident travel characteristics and combining priori knowledge, and constructing a first travel characteristic relation network; constructing a second travel characteristic relation network by adopting a machine learning method based on the sample data; model scoring is carried out by using a BIC function, and the first trip characteristic relation network and the second trip characteristic relation network are integrated to optimize and construct a Bayesian network model;
s5, performing uncertainty quantitative characterization on input features of the Bayesian network model, processing continuous trip features into ordered discrete states, dividing a sample data set into a training set and a test set, utilizing the training sample data set to complete parameter learning of each node of the Bayesian network model, evaluating model precision through the test sample data set, and finally completing resident trip mode reliable identification model construction based on the Bayesian network model.
Further, the specific steps of the pretreatment in S1 include:
s1-1, invalid redundant data filtering: screening invalid data and repeated record data in original data of the mobile phone signaling, and removing the invalid data, wherein the invalid data refers to data of missing position area codes and cell codes;
s1-2, ping-pong data processing: updating the starting time of the first signaling data and the ending time of the last signaling data of the ping-pong data into the starting time and the ending time of the new signaling data, and taking the recording duration of the repeated data as the recording duration of the new signaling data;
s1-3, drift data processing:
judging whether the space distance between the mobile phone base stations corresponding to the starting cell and the ending cell of each signaling data record exceeds a distance threshold value, dividing the space distance by the signaling data record duration, judging whether the switching speed exceeds a switching speed threshold value, and identifying signaling data exceeding the distance threshold value and the switching speed threshold value as drift data;
merging adjacent drift data records: updating the starting time field of the former signaling data and the ending time field of the latter signaling data into the starting time field and the ending time field of the new signaling data, and summing the recording duration of the drift data to be the recording duration of the new signaling data.
Further, the specific step of matching the mobile phone signaling data with resident trip investigation data in S2 includes:
s2-1, screening the sex characteristics of the users in the mobile phone signaling data according to the sex characteristics of the travelers in each piece of resident trip investigation data, and reserving the mobile phone signaling data which are the same as the sex characteristics of the travelers in the resident trip investigation data;
s2-2, screening the mobile phone signaling data according to the age characteristic attribute of the traveler in each resident trip survey data, and reserving the mobile phone signaling data with the age difference less than or equal to 2 years old;
s2-3, counting the number N of elements in the mobile phone signaling data matched with each piece of resident trip investigation data, and if N=0, namely the resident trip investigation data is not successfully matched with the mobile phone signaling data, removing the resident trip investigation data from the resident trip investigation data set; if n=1, that is, the resident trip investigation data successfully matches with the unique mobile phone signaling data, the resident trip investigation data is reserved; if N is more than or equal to 1, namely a plurality of pieces of mobile phone signaling data are matched with the resident trip investigation data, the resident trip investigation data are removed from the resident trip investigation data set.
Further, the specific step of S3 includes:
s3-1, extracting traffic environment characteristics, and calculating traffic facility index data of each traffic cell, wherein the traffic facility index data comprise bus stop coverage rate, bus line repeatability, intersection density, road network density and land mixing degree;
s3-2, extracting individual characteristics of the traveler, including gender and age;
s3-3, extracting traffic travel behavior characteristics including travel duration, travel distance and travel average speed;
s3-4, introducing a mutual information value to represent the correlation between the resident travel characteristics and travel modes, and respectively calculating the maximum mutual information number and constructing a travel characteristic information matrix in a peak period and a peaked period by considering time specificity influence, wherein the maximum mutual information number formula is as follows:
Figure BDA0004000487620000041
wherein MIC (X, Y) represents the maximum mutual information number, X, Y are row elements and column elements in the grid, respectively, n X ,n Y Represents the grid number on the horizontal axis and the vertical axis respectively, and the constraint condition is the grid number n X n Y <B,(B=n 0.6 ) N represents the total number of samples, p (x, y) is the joint probability density function of two elements, p (x), and p (y) is the marginal probability density function of two elements.
Further, the specific calculation steps of the bus stop coverage rate index, the bus line repeatability index, the intersection density index, the road network density index and the land mixing degree index in the S3-1 comprise the following steps:
1) Bus stop coverage rate
Figure BDA0004000487620000042
/>
Wherein BCR i Representation of the intersectionCoverage rate of bus stops passing through cell i, S i Representing the area of traffic cell i, S j Represents a certain radius area with bus station j as the center of a circle, n i The number of bus stops in the traffic cell i is represented, and l represents the number of traffic cells;
2) Bus route repeatability
Figure BDA0004000487620000043
Wherein BRRC i Representing the bus route repeatability of the traffic cell i, L i Representing total length of road network in traffic cell i, L k Represents the length of a bus line k, m i Representing the number of bus lines in a traffic cell i;
3) Density of road intersection
RID i =N/S i ,i=(1,2,...,l)
Wherein RID i Representing the density of road intersections of traffic cell i, N representing the number of intersections within traffic cell i, S i Representing the area of traffic cell i;
4) Road network density
RND i =L i /S i ,i=(1,2,...,l)
Wherein, RND i Road network density, L, representing traffic cell i i Representing total length of road network in traffic cell i, S i Representing the area of traffic cell i;
5) Ground mixing degree
Figure BDA0004000487620000051
Wherein LM i Representing the land mix, p, of traffic cell i q Is the percentage of the Q-th land area to the area of the corresponding traffic district, Q is the total number of land categories, s q Is the area of the q-th land.
Further, the specific calculation steps of the travel distance and the travel average speed in the S3-3 comprise:
1) Travel time length
Taking a representative time stamp of a primary trip start point as a departure time, taking a representative time stamp of a trip destination point as an arrival time, and taking the difference between the representative time stamp and the arrival time as a trip duration T I
2) Travel distance
Figure BDA0004000487620000052
Wherein D is I Represents the travel distance of user I, n I Represents the total number of intersections through which user I passes, d i,i+1 Representing the spatial distance between intersection i and intersection i+1;
3) Average travel speed
v I =D I /T I
Wherein v is I Represents the average travel speed of the user I, T I And the travel duration of the user I is represented.
Further, the specific step of S4 includes:
s4-1, judging the dependence or independent relation of each travel characteristic in the information theory sense based on expert experience and correlation analysis conclusion, and constructing a first Bayesian network model with the dominant information theory;
s4-2, obtaining dependence or independent relation of each travel characteristic in the sense of probability theory based on a climbing method, and constructing a second Bayesian network model with dominant probability theory;
s4-3, evaluating the structural performance of the two Bayesian network models by taking BIC measure as a model structure scoring function, and finally constructing the Bayesian network topological structure based on the prior knowledge and sample data of the field and the dependency or independent relation of trip characteristics in the meaning of comprehensive information theory and probability theory according to the BIC scoring result.
Further, the specific step of S5 includes:
s5-1, performing uncertainty characterization on travel characteristics of a Bayesian network: carrying out statistical analysis on sample data of different travel modes, expressing travel characteristic uncertainty in a probability form based on prior knowledge analysis in the traffic field to form an uncertainty physical characterization scheme, and dispersing continuous characteristic variables into ordered discrete states; the resident travel characteristics comprise the characteristics of the sex of the traveler, the age of the traveler, travel duration, travel distance, travel average speed, bus system service level, road system construction level, land level and the like;
s5-2, dividing the mobile phone signaling data with the trip mode labels into two data sets, wherein 80% of the data sets are used as test data sets of node parameters of the Bayesian network model, 20% of the data sets are used as verification data sets of subsequent model effects, learning the node parameters of the Bayesian network model based on the test data sets, obtaining a conditional probability table of each node in the Bayesian network model, and completing trip mode identification model construction.
Further, the formula for learning the node parameters of the Bayesian network model is as follows:
Figure BDA0004000487620000061
where θ refers to the maximum posterior probability of θ, D represents the sample dataset and P (D) is the a priori distribution of the sample dataset.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
1. the resident trip mode reliable identification method driven by the mobile phone signaling data adopts the mobile phone signaling data, and compared with GPS data, the method can reduce the data acquisition difficulty and the data acquisition cost, and has wider sample coverage.
2. The reliable identification method for resident travel modes driven by mobile phone signaling data considers the common influence of various uncertainty factors such as traveler attributes, traffic facility conditions, traffic running environments and the like in a traffic system on travel characteristics of various travel modes, can reveal the action mechanism of the mutual influence of multiple factors, and defines the travel characteristic typical scene of various travel modes.
3. According to the method for reliably identifying the resident trip mode driven by the mobile phone signaling data, disclosed by the invention, the Bayesian network is respectively constructed from the angles of the information theory and the probability theory, the hidden relation among trip mode characteristics is considered, the BIC function is used for carrying out network scoring, and the method has higher reliability and precision for identifying the trip mode.
Drawings
The accompanying drawings are included to provide a further understanding of the invention, and together with the description serve to explain the embodiments of the invention, and do not constitute a limitation of the invention. In the drawings:
fig. 1 shows a travel duration calculation schematic diagram in the resident travel mode reliable identification method driven by mobile phone signaling data.
Fig. 2 shows a bayesian network topology structure diagram constructed based on information theory in the mobile phone signaling data-driven resident trip mode reliable identification method.
Fig. 3 shows a bayesian network topology structure diagram constructed based on probability theory in the mobile phone signaling data-driven resident trip mode reliable identification method.
Fig. 4 shows a peak period travel characteristic mutual information matrix result diagram in the mobile phone signaling data-driven resident travel mode reliable identification method.
Fig. 5 shows a matrix result diagram of the travel characteristic mutual information in the peaked time period in the method for reliably identifying the resident travel mode driven by the mobile phone signaling data.
Fig. 6 shows a bayesian network topology structure diagram constructed by comprehensive information theory and probability theory in the mobile phone signaling data driven resident trip mode reliable identification method.
Fig. 7 shows a distribution histogram of the travel pattern selection ratio of different travelers in the mobile phone signaling data-driven resident travel pattern reliable identification method of the invention.
Fig. 8 shows a gender proportion distribution histogram of travelers in different travel modes in the mobile phone signaling data-driven resident travel mode reliable identification method.
Fig. 9 shows histograms of distribution ratio of travel mode selections of travelers of different ages in the method for reliably identifying resident travel modes driven by mobile phone signaling data.
Fig. 10 shows a probability density chart of travel duration distribution of different travel modes in the method for reliably identifying the travel modes of residents driven by mobile phone signaling data.
Fig. 11 shows a probability density map of travel distance distribution of different travel modes in the method for reliably identifying the travel modes of residents driven by mobile phone signaling data.
Fig. 12 shows a probability density map of travel average speed distribution of different travel modes in the method for reliably identifying the travel modes of residents driven by mobile phone signaling data.
Fig. 13 shows distribution probability density diagrams of coverage rates of bus stops of different travel modes of travel cells in the resident travel mode reliable identification method driven by mobile phone signaling data.
Fig. 14 shows distribution probability density diagrams of bus route repetition coefficients of different travel modes in the resident travel mode reliable identification method driven by mobile phone signaling data.
Fig. 15 shows probability density maps of the intersection density distribution of travel cells of different travel modes in the method for reliably identifying the travel modes of residents driven by mobile phone signaling data.
Fig. 16 shows a probability density map of the density distribution of the travel cell network of different travel modes in the method for reliably identifying the travel modes of residents driven by mobile phone signaling data.
Fig. 17 shows a probability density map of land mix distribution of different travel modes in the method for reliably identifying the travel modes of residents driven by mobile phone signaling data.
Fig. 18 shows a diagram of a real travel pattern structural ratio of a verification data set and an inferred travel pattern structural ratio in the resident travel pattern reliable identification method driven by mobile phone signaling data of the present invention.
Fig. 19 shows a flowchart of the resident trip mode reliable identification method driven by the mobile phone signaling data of the present invention.
Detailed Description
For a further understanding of the present invention, preferred embodiments of the invention are described below in conjunction with the examples, but it should be understood that these descriptions are merely intended to illustrate further features and advantages of the invention, and are not limiting of the claims of the invention.
The description of this section is intended to be illustrative of only exemplary embodiments and is not intended to be limiting of the scope of the embodiments described herein. Combinations of the different embodiments, and alternatives of features from the same or similar prior art means and embodiments are also within the scope of the description and protection of the invention.
A method for reliably identifying resident trip modes driven by mobile phone signaling data comprises the following steps:
s1, acquiring original data of mobile phone signaling, and preprocessing the original data of the mobile phone signaling to obtain preprocessed data of the mobile phone signaling;
s2, acquiring resident trip investigation data, wherein each resident trip investigation data comprises departure time, arrival time, departure place and arrival place, the departure place and the arrival place are represented by traffic cell numbers, and the mobile phone signaling data and the resident trip investigation data are subjected to feature matching to obtain mobile phone signaling data with trip mode labels;
s3, dividing the area to be identified into a plurality of traffic cells, extracting resident travel characteristics based on time specificity and uncertainty of the travel characteristics, wherein the resident travel characteristics comprise traffic environment characteristics, individual characteristics of travelers and travel behavior characteristics, carrying out correlation analysis on the resident travel characteristics by combining travel mode labels, and quantifying correlation among the travel characteristics by mathematical indexes;
s4, designing a Bayesian network model framework for reliably identifying resident trip modes: judging causal links of resident travel characteristics according to correlations among resident travel characteristics and combining priori knowledge, and constructing a first travel characteristic relation network; constructing a second travel characteristic relation network by adopting a machine learning method based on the sample data; model scoring is carried out by using a BIC function, and the first trip characteristic relation network and the second trip characteristic relation network are integrated to optimize and construct a Bayesian network model;
s5, performing uncertainty quantitative characterization on input features of the Bayesian network model, processing continuous trip features into ordered discrete states, dividing a sample data set into a training set and a test set, utilizing the training sample data set to complete parameter learning of each node of the Bayesian network model, evaluating model precision through the test sample data set, and finally completing resident trip mode reliable identification model construction based on the Bayesian network model.
According to the invention, the travel characteristics are respectively subjected to peak time period and peaked time period correlation analysis aiming at the uncertainty of the travel characteristics of various travel modes under different operation conditions of the traffic system, and the influence of travel time period attributes on the travel characteristics of different travel modes is evaluated by travel speed variation coefficient indexes. Meanwhile, an uncertainty physical characterization scheme is provided for uncertainty of the multi-dimensional traffic trip characteristics, dependency relations among the trip characteristics are respectively researched from the angles of information theory and probability theory, and a Bayesian network structure learning and parameter learning method is utilized to realize establishment of a resident trip mode reliable identification model based on a Bayesian network.
According to the invention, the travel characteristics are respectively subjected to peak time period and peaked time period correlation analysis aiming at the uncertainty of the travel characteristics of various travel modes under different operation conditions of the traffic system, and the influence of travel time period attributes on the travel characteristics of different travel modes is evaluated by travel speed variation coefficient indexes. The travel characteristics of various travel modes are considered to be jointly influenced by various uncertainty factors such as traveler attributes, traffic facility conditions, traffic running environments and the like in a traffic system, the action mechanism of the mutual influence of multiple factors can be revealed, and typical travel characteristic scenes of various travel modes can be defined. Meanwhile, aiming at the uncertainty of the multi-dimensional traffic travel characteristics, an uncertainty physical characterization scheme is provided, dependency relationships among the travel characteristics are respectively researched from the angles of information theory and probability theory, a Bayesian network structure learning and parameter learning method is utilized, the hidden relationships among the travel mode characteristics are considered, a BIC function is utilized for carrying out network scoring, the construction of a resident travel mode reliable identification model based on a Bayesian network is realized, and the identification of the travel mode is higher in reliability and precision.
The mobile phone signaling data refers to a series of control instructions which are generated by the mobile communication network actively or passively, periodically or aperiodically for keeping contact with the mobile terminal of the mobile phone user, and comprises fields such as mobile phone identification codes, time stamps, event types, base station numbers, longitude and latitude of the base station, number attribution and the like. Compared with GPS data, the mobile phone signaling data can reduce the data acquisition difficulty and the data acquisition cost, and has wider sample coverage.
Example 1
A method for reliably identifying resident trip modes driven by mobile phone signaling data comprises the following steps:
in step S1, the original data is mainly 4G mobile phone signaling data provided by chinese mobile communication company in kunshan, jiangsu province, and also includes part of 2G and 3G mobile phone signaling data. The data adopts COO positioning technology, and the positioning precision depends on the service range and erection density of the base station. Table 1 lists field information of the mobile phone base station in kunshan, and mainly includes information such as location area code, base station cell code, base station name, and longitude and latitude of the base station.
TABLE 1 Kunshan City base station information Table
Figure BDA0004000487620000091
Figure BDA0004000487620000101
The data field and meaning of the research sample 4G signaling data are shown in table 2, and mainly comprise information such as position area code triggered by the signaling data, base station cell code, signaling trigger time and the like.
Table 2 meaning table for mobile phone signaling data field
Figure BDA0004000487620000102
In step S2, final feature matching of resident trip investigation data and mobile phone signaling data is completed by combining personal feature attributes of the travelers and mobile phone users. Wherein, type is the class of mode of traveling, 1 is walking, 2 is the bicycle, 3 is the car, 4 is the bus, 5 is the car. The obtained cell phone signaling data with trip tag is shown in table 3 below.
Table 3 partial match data example
Figure BDA0004000487620000103
/>
Figure BDA0004000487620000111
In step S3, based on the trip mode label and three trip characteristics corresponding to the mobile phone signaling data, the correlation is checked by calculating the maximum mutual information coefficient of each two variables, and the variable information is shown in the following table 4.
TABLE 4 traffic travel characteristics field
Figure BDA0004000487620000112
The maximum mutual information coefficient matrix is shown in fig. 5-6, the abscissa corresponds to the variable sequence numbers in table 4, the values of the maximum mutual information numbers all belong to 0-1, and the larger the value is, the higher the dependency degree is. As can be seen from fig. 5 and 6, the travel mode has a low degree of dependence on individual attribute characteristics, has a high degree of dependence on traffic travel behavior characteristics, and has a minimum degree of dependence on traffic environment characteristics including a public transportation system service level, a road system construction level, and a land utilization level. Whether the travel time is a peak time or a flat time, the travel distance and the travel average speed are greatly related to the travel mode, which indicates that the two characteristics play a key role in identifying the travel mode; among the traffic environment features, the traffic district bus stop coverage rate features have the strongest correlation with the travel mode, and other traffic environment features are correlated with the traffic district bus stop coverage rate features.
In step S4, the BIC measure is used as a model structure scoring function to compare the structural performances of two bayesian networks, and the results are shown in table 5:
TABLE 5 Bayesian network topology BIC scoring table
Figure BDA0004000487620000121
From the BIC scoring result, the Bayesian network structure built based on the information theory is superior to the Bayesian network structure learned from the data based on the probability theory, because the complexity of the model structure learned by the mountain climbing algorithm is significantly higher than that of the self-built network, and the calculation cost of the Bayesian network is increased. The optimized bayesian network structure is thus shown in fig. 6.
In step S5, different travel characteristics are presented in consideration of different travel modes in the traffic system; under the influence of factors such as traffic facility conditions, traffic running conditions and the like, the same traffic travel mode can also show different travel characteristics. Expressing travel characteristic uncertainty in a probability form to form an uncertainty physical characterization scheme, and dispersing continuous characteristic variables into ordered discrete states.
1) Sex of traveler
The selection preference of different travelers to the travel mode and the distribution situation of different travelers in the same travel mode are analyzed from the statistical perspective, as shown in fig. 7 and 8.
2) Age of traveler
Statistical analysis of the sample data age distribution can know that the age distribution of the traveler population in different travel modes presents different waveforms, as shown in fig. 9.
3) Travel time length
And obtaining travel duration distribution probability density curves of different travel modes of residents in Kunshan through data fitting of travel durations of different travel modes, as shown in fig. 10.
4) Travel distance
And obtaining travel distance distribution probability density curves of different travel modes of residents in Kunshan by carrying out data fitting on the travel distances of different travel modes, as shown in fig. 11.
5) Average speed of travel
The travel average speed distribution probability density curves of different travel modes of residents in Kunshan are obtained by carrying out data fitting on the travel average speeds of different travel modes, as shown in fig. 12.
6) Bus system service level
According to different travel modes, the bus stop coverage rate of the travel starting point cell and the travel ending point cell and the bus route repetition coefficient distribution probability density function curves are respectively fitted, as shown in fig. 13 and 14.
7) Road system construction level
According to different travel modes, probability density function curves of the road intersection densities and the road network densities of the travel starting point cell and the travel ending point cell are respectively fitted, as shown in fig. 15 and 16.
8) Land use level
According to the travel mode, a travel starting point cell and end point cell land mixing degree distribution probability density function curve is fitted, as shown in fig. 17.
All the characteristic variables are discretized into random variables with limited states, and table 6 lists all the travel characteristic variables subjected to discretization and the meanings of the discrete states.
TABLE 6 travel characteristic variable discrete states
Figure BDA0004000487620000131
Figure BDA0004000487620000141
And carrying out data set division on the mobile phone signaling data with the trip mode labels, wherein 80% of the mobile phone signaling data are used as data sets for learning the node parameters of the Bayesian network model, and the remaining 20% are used for checking the effect of the follow-up model. The number distribution of each travel pattern is shown in table 7.
Table 7 training set and test set quantity distribution
Figure BDA0004000487620000142
And carrying out model node parameter learning based on the test set to obtain a conditional probability table of each node in the Bayesian network. Taking the average speed of the stroke as an example, as shown in table 8.
TABLE 8 Stroke average speed node Condition probability Table
Figure BDA0004000487620000143
According to the analysis of the condition probability distribution condition of the travel average speed node, the walking travel speed is obviously lower than that of other modes, and the upper end of the speed interval is crossed with the travel speed interval of the bicycle to a certain extent; the speed distribution intervals of the electric bicycle and the bus travel are quite similar, the overall speed of the electric bicycle is slightly higher than that of the bus, and other characteristic nodes are needed to assist in identifying the two modes; the speed distribution interval of the car travel is obviously higher than that of other travel modes, but the car travel has certain intersection with the speeds of the bus travel and the electric bicycle, and other characteristic nodes are also required to assist in identifying the travel modes.
Based on the observation value set of the travel characteristics in the verification data set, the probability of the travel characteristics belonging to each travel mode can be deduced. And taking the trip mode with the highest probability as a Bayesian network inference result. The recognition accuracy of the model to the data of the Kunshan mobile phone signaling verification set is 82.91%, and the classification confusion matrix of the verification data set is shown in Table 9.
Table 9 verifies the data set classification confusion matrix
Figure BDA0004000487620000151
From the whole point of view, the difference between the travel mode structure identified by the model and the actual travel structure is compared and analyzed, and the result is shown in fig. 18. From the figure, the bayesian network has better structural grasp of the travel mode, and especially the structural proportion of walking, electric bicycles and cars is quite close to the real proportion.
Based on mobile phone signaling data, the method carries out deep analysis from the time specificity and uncertainty angles of the travel characteristics of the residents, and designs an effective extraction method of the travel characteristics of the residents. Aiming at the problem of reliable identification of resident traffic travel modes, a Bayesian network model frame design is carried out, and further, through traffic travel characteristic time specificity quantitative characterization, a resident traffic travel mode reliable identification model considering the uncertainty of the traffic travel characteristic is finally constructed. The method can grasp the urban traffic travel mode structure in a new period for related departments, formulate a traffic travel mode structure optimization policy and promote the development of green traffic and low-carbon traffic to provide effective basic data support.
The description and applications of the present invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. The relevant descriptions of effects, advantages and the like in the description may not be presented in practical experimental examples due to uncertainty of specific condition parameters or influence of other factors, and the relevant descriptions of effects, advantages and the like are not used for limiting the scope of the invention. Variations and modifications of the embodiments disclosed herein are possible, and alternatives and equivalents of the various components of the embodiments are known to those of ordinary skill in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other assemblies, materials, and components, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims (9)

1. A method for reliably identifying a resident trip mode driven by mobile phone signaling data is characterized by comprising the following steps:
s1, acquiring original data of mobile phone signaling, and preprocessing the original data of the mobile phone signaling to obtain preprocessed data of the mobile phone signaling;
s2, acquiring resident trip investigation data, wherein each resident trip investigation data comprises departure time, arrival time, departure place and arrival place, the departure place and the arrival place are represented by traffic cell numbers, and the mobile phone signaling data and the resident trip investigation data are subjected to feature matching to obtain mobile phone signaling data with trip mode labels;
s3, dividing the area to be identified into a plurality of traffic cells, extracting resident travel characteristics based on time specificity and uncertainty of the travel characteristics, wherein the resident travel characteristics comprise traffic environment characteristics, individual characteristics of travelers and travel behavior characteristics, carrying out correlation analysis on the resident travel characteristics by combining travel mode labels, and quantifying correlation among the travel characteristics by mathematical indexes;
s4, designing a Bayesian network model framework for reliably identifying resident trip modes: judging causal links of resident travel characteristics according to correlations among resident travel characteristics and combining priori knowledge, and constructing a first travel characteristic relation network; constructing a second travel characteristic relation network by adopting a machine learning method based on the sample data; model scoring is carried out by using a BIC function, and the first trip characteristic relation network and the second trip characteristic relation network are integrated to optimize and construct a Bayesian network model;
s5, performing uncertainty quantitative characterization on input features of the Bayesian network model, processing continuous trip features into ordered discrete states, dividing a sample data set into a training set and a test set, utilizing the training sample data set to complete parameter learning of each node of the Bayesian network model, evaluating model precision through the test sample data set, and finally completing resident trip mode reliable identification model construction based on the Bayesian network model.
2. The method for reliably identifying the travel mode of the resident driven by the mobile phone signaling data according to claim 1, wherein the specific steps of preprocessing in S1 include:
s1-1, invalid redundant data filtering: screening invalid data and repeated record data in original data of the mobile phone signaling, and removing the invalid data, wherein the invalid data refers to data of missing position area codes and cell codes;
s1-2, ping-pong data processing: updating the starting time of the first signaling data and the ending time of the last signaling data of the ping-pong data into the starting time and the ending time of the new signaling data, and taking the recording duration of the repeated data as the recording duration of the new signaling data;
s1-3, drift data processing:
judging whether the space distance between the mobile phone base stations corresponding to the starting cell and the ending cell of each signaling data record exceeds a distance threshold value, dividing the space distance by the signaling data record duration, judging whether the switching speed exceeds a switching speed threshold value, and identifying signaling data exceeding the distance threshold value and the switching speed threshold value as drift data;
merging adjacent drift data records: updating the starting time field of the former signaling data and the ending time field of the latter signaling data into the starting time field and the ending time field of the new signaling data, and summing the recording duration of the drift data to be the recording duration of the new signaling data.
3. The reliable recognition method of resident trip mode driven by mobile phone signaling data according to claim 1, wherein the specific step of matching the mobile phone signaling data with resident trip survey data in S2 comprises:
s2-1, screening the sex characteristics of the users in the mobile phone signaling data according to the sex characteristics of the travelers in each piece of resident trip investigation data, and reserving the mobile phone signaling data which are the same as the sex characteristics of the travelers in the resident trip investigation data;
s2-2, screening the mobile phone signaling data according to the age characteristic attribute of the traveler in each resident trip survey data, and reserving the mobile phone signaling data with the age difference less than or equal to 2 years old;
s2-3, counting the number N of elements in the mobile phone signaling data matched with each piece of resident trip investigation data, and if N=0, namely the resident trip investigation data is not successfully matched with the mobile phone signaling data, removing the resident trip investigation data from the resident trip investigation data set; if n=1, that is, the resident trip investigation data successfully matches with the unique mobile phone signaling data, the resident trip investigation data is reserved; if N is more than or equal to 1, namely a plurality of pieces of mobile phone signaling data are matched with the resident trip investigation data, the resident trip investigation data are removed from the resident trip investigation data set.
4. The method for reliably identifying the travel mode of the resident driven by the mobile phone signaling data according to claim 1, wherein the specific step of S3 comprises the following steps:
s3-1, extracting traffic environment characteristics, and calculating traffic facility index data of each traffic cell, wherein the traffic facility index data comprise bus stop coverage rate, bus line repeatability, intersection density, road network density and land mixing degree;
s3-2, extracting individual characteristics of the traveler, including gender and age;
s3-3, extracting traffic travel behavior characteristics including travel duration, travel distance and travel average speed;
s3-4, introducing a mutual information value to represent the correlation between the resident travel characteristics and travel modes, and respectively calculating the maximum mutual information number and constructing a travel characteristic information matrix in a peak period and a peaked period by considering time specificity influence, wherein the maximum mutual information number formula is as follows:
Figure FDA0004000487610000021
wherein MIC (X, Y) represents the maximum mutual information number, X, Y are row elements and column elements in the grid, respectively, n X ,n Y Represents the grid number on the horizontal axis and the vertical axis respectively, and the constraint condition is the grid number n X n Y <B,(B=n 0.6 ),n represents the total number of samples, p (x, y) is the joint probability density function of two elements, p (x), p (y) is the marginal probability density function of two elements.
5. The method for reliably identifying the traveling mode of the resident driven by the mobile phone signaling data according to claim 4, wherein the specific calculation steps of the bus stop coverage rate index, the bus line repeatability index, the intersection density index, the road network density index and the land mixing index in the S3-1 comprise the following steps:
1) Bus stop coverage rate
Figure FDA0004000487610000031
Wherein BCR i Representing the bus stop coverage rate of a traffic cell i, S i Representing the area of traffic cell i, S j Represents a certain radius area with bus station j as the center of a circle, n i The number of bus stops in the traffic cell i is represented, and l represents the number of traffic cells;
2) Bus route repeatability
Figure FDA0004000487610000032
Wherein BRRC i Representing the bus route repeatability of the traffic cell i, L i Representing total length of road network in traffic cell i, L k Represents the length of a bus line k, m i Representing the number of bus lines in a traffic cell i;
3) Density of road intersection
RID i =N/S i ,i=(1,2,...,l)
Wherein RID i Representing the density of road intersections of traffic cell i, N representing the number of intersections within traffic cell i, S i Representing the area of traffic cell i;
4) Road network density
RND i =L i /S i ,i=(1,2,...,l)
Wherein, RND i Road network density, L, representing traffic cell i i Representing total length of road network in traffic cell i, S i Representing the area of traffic cell i;
5) Ground mixing degree
Figure FDA0004000487610000033
Wherein LM i Representing the land mix, p, of traffic cell i q Is the percentage of the Q-th land area to the area of the corresponding traffic district, Q is the total number of land categories, s q Is the area of the q-th land.
6. The method for reliably identifying the travel mode of the resident driven by the mobile phone signaling data according to claim 4, wherein the specific calculation step of the travel distance and the travel average speed in the step S3-3 comprises the following steps:
1) Travel time length
Taking a representative time stamp of a primary trip start point as a departure time, taking a representative time stamp of a trip destination point as an arrival time, and taking the difference between the representative time stamp and the arrival time as a trip duration T I
2) Travel distance
Figure FDA0004000487610000041
Wherein D is I Represents the travel distance of user I, n I Represents the total number of intersections through which user I passes, d i,i+1 Representing the spatial distance between intersection i and intersection i+1;
3) Average travel speed
v I =D I /T I
Wherein v is I Represents the average travel speed of the user I, T I And the travel duration of the user I is represented.
7. The method for reliably identifying the travel mode of the resident driven by the mobile phone signaling data according to claim 1, wherein the specific step of S4 comprises the following steps:
s4-1, judging the dependence or independent relation of each travel characteristic in the information theory sense based on expert experience and correlation analysis conclusion, and constructing a first Bayesian network model with the dominant information theory;
s4-2, obtaining dependence or independent relation of each travel characteristic in the sense of probability theory based on a climbing method, and constructing a second Bayesian network model with dominant probability theory;
s4-3, evaluating the structural performance of the two Bayesian network models by taking BIC measure as a model structure scoring function, and finally constructing the Bayesian network topological structure based on the prior knowledge and sample data of the field and the dependency or independent relation of trip characteristics in the meaning of comprehensive information theory and probability theory according to the BIC scoring result.
8. The method for reliably identifying the travel mode of the resident driven by the mobile phone signaling data according to claim 1, wherein the specific step of S5 comprises the following steps:
s5-1, performing uncertainty characterization on travel characteristics of a Bayesian network: carrying out statistical analysis on sample data of different travel modes, expressing travel characteristic uncertainty in a probability form based on prior knowledge analysis in the traffic field to form an uncertainty physical characterization scheme, and dispersing continuous characteristic variables into ordered discrete states; the resident travel characteristics comprise the characteristics of the sex of the traveler, the age of the traveler, travel duration, travel distance, travel average speed, bus system service level, road system construction level, land level and the like;
s5-2, dividing the mobile phone signaling data with the trip mode labels into two data sets, wherein 80% of the data sets are used as test data sets of node parameters of the Bayesian network model, 20% of the data sets are used as verification data sets of subsequent model effects, learning the node parameters of the Bayesian network model based on the test data sets, obtaining a conditional probability table of each node in the Bayesian network model, and completing trip mode identification model construction.
9. The reliable recognition method of resident trip mode driven by mobile phone signaling data according to claim 8, wherein the formula of learning the node parameters of the bayesian network model is as follows:
Figure FDA0004000487610000051
where θ refers to the maximum posterior probability of θ, D represents the sample dataset and P (D) is the a priori distribution of the sample dataset.
CN202211616958.4A 2022-12-15 2022-12-15 Method for reliably identifying resident trip mode driven by mobile phone signaling data Pending CN116017407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211616958.4A CN116017407A (en) 2022-12-15 2022-12-15 Method for reliably identifying resident trip mode driven by mobile phone signaling data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211616958.4A CN116017407A (en) 2022-12-15 2022-12-15 Method for reliably identifying resident trip mode driven by mobile phone signaling data

Publications (1)

Publication Number Publication Date
CN116017407A true CN116017407A (en) 2023-04-25

Family

ID=86024037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211616958.4A Pending CN116017407A (en) 2022-12-15 2022-12-15 Method for reliably identifying resident trip mode driven by mobile phone signaling data

Country Status (1)

Country Link
CN (1) CN116017407A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542402A (en) * 2023-07-06 2023-08-04 北京大学 Resident trip mode prediction method
CN116777243A (en) * 2023-06-21 2023-09-19 中国联合网络通信有限公司深圳市分公司 Resident trip index evaluation method and device and computer readable storage medium
CN117119387A (en) * 2023-10-25 2023-11-24 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777243A (en) * 2023-06-21 2023-09-19 中国联合网络通信有限公司深圳市分公司 Resident trip index evaluation method and device and computer readable storage medium
CN116542402A (en) * 2023-07-06 2023-08-04 北京大学 Resident trip mode prediction method
CN116542402B (en) * 2023-07-06 2023-10-03 北京大学 Resident trip mode prediction method
CN117119387A (en) * 2023-10-25 2023-11-24 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data
CN117119387B (en) * 2023-10-25 2024-01-23 北京市智慧交通发展中心(北京市机动车调控管理事务中心) Method and device for constructing user travel chain based on mobile phone signaling data

Similar Documents

Publication Publication Date Title
WO2020238631A1 (en) Population type recognition method based on mobile phone signaling data
CN108629978B (en) Traffic track prediction method based on high-dimensional road network and recurrent neural network
CN111653097B (en) Urban trip mode comprehensive identification method based on mobile phone signaling data and containing personal attribute correction
CN116017407A (en) Method for reliably identifying resident trip mode driven by mobile phone signaling data
CN109034448B (en) Trajectory prediction method based on vehicle trajectory semantic analysis and deep belief network
CN111081016B (en) Urban traffic abnormity identification method based on complex network theory
CN110836675B (en) Decision tree-based automatic driving search decision method
CN103984994B (en) Method for predicting urban rail transit passenger flow peak duration
CN106408343A (en) Modeling method and device for user behavior analysis and prediction based on BP neural network
Alfeo et al. A stigmergy-based analysis of city hotspots to discover trends and anomalies in urban transportation usage
CN110837973B (en) Human trip selection information mining method based on traffic trip data
CN112598438A (en) Outdoor advertisement recommendation system and method based on large-scale user portrait
CN112365708A (en) Scenic spot traffic volume prediction model establishing and predicting method based on multi-graph convolution network
Qiu et al. RPSBPT: A route planning scheme with best profit for taxi
CN117271899A (en) Interest point recommendation method based on space-time perception
CN113159371A (en) Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN112765226A (en) Urban semantic map construction method based on trajectory data mining
CN115206104B (en) Urban resident traffic trip mode identification method based on mobile phone signaling data
CN116756695A (en) Urban function collaborative optimization method integrating geographic features and flow features
CN116415756A (en) Urban virtual scene experience management system based on VR technology
Li et al. Trip purpose identification of docked bike-sharing from IC card data using a continuous hidden Markov model
Singh et al. Comparative Analysis of Classification Models for Predicting Quality of Air
CN115965466A (en) Sub-graph comparison-based Ethernet room account identity inference method and system
CN112287996B (en) Major event key factor mining method based on machine learning
CN114611622A (en) Method for identifying cross-city commuting crowd by utilizing mobile phone data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination