Method for extracting travel stop point by using mobile phone signaling data
Technical Field
The invention relates to a method for extracting a line stop point. In particular to a method for extracting travel stop points by using mobile phone signaling data.
Background
In the field of urban planning, the urban morphological evolution law, population post distribution characteristics, travel flow rules and other aspects are focused on so as to scientifically and reasonably formulate a development strategy, arrange infrastructures and provide traffic support service. The traditional means for acquiring data mainly comprises census, economic census, resident trip sampling survey and the like, the survey method consumes a large amount of manpower and material resources, the sampling rate is low, the precision is not high, the data updating period is long, only data of a specific time segment can be acquired, and the method is difficult to adapt to the planning management requirements of a new period. The wide popularization of the mobile phone and the analysis technology of mass signaling data of the mobile phone provide a brand-new, efficient and convenient data acquisition and analysis means for city planning, compared with the traditional resident trip investigation, the mobile phone signaling data has the advantages of wide coverage range, large analysis sample, low implementation cost, long-term continuous monitoring and the like, and provides a better basis for the observation and analysis of the behavior and activity characteristics of urban population.
The mobile phone signaling data has the characteristics of large data volume, wide coverage range, low precision, more noise data and the like, and the quality of the signaling data is improved by a scientific and reasonable data cleaning method. Compared with the traditional investigation mode, although the mobile phone signaling data has obvious advantages, the method still has the defects and shortcomings. Although the mobile phone signaling data has large data volume and wide coverage range, the mobile phone signaling data also has the characteristics of low precision, more noise data and the like, so that higher requirements are put forward on the cleaning of the data.
At present, a method for cleaning signaling data is mainly divided into two parts, namely conventional data processing and self-characteristic processing of the signaling data. Conventional data processing includes filtering out null, error, and duplicate values, among others. The signaling data is processed by self characteristics, and specific ping-pong switching data, drifting data, static data and the like are processed according to the generation principle of the mobile phone signaling data. Through hierarchical data cleaning, redundant calculation is reduced, and cleaning efficiency is improved.
The mobile phone signaling data directly comes from the mobile communication system of operators (China Mobile, China Unicom, China telecom). When the mobile phone takes the actions of making a call (calling/called), receiving and sending short messages, hanging up, updating the position, switching the base station, connecting the internet and the like, the related base station information is recorded, and the spatial position is inquired through the base station number to obtain the positioning information. Typical signaling data typically contains location information for the handset, a unique number for the handset SIM card (encrypted by the carrier to eliminate user personal information), timestamp information, etc. The sampling frequency of the signaling is not high in the 2G network environment, the number of the user daily records is about 30-60, the main body is the position updating signaling, the sampling frequency of the signaling is greatly improved in the 3G/4G network environment, the number of the user daily records can reach 200-1000, and the main body is the internet surfing signaling.
Typical signaling data typically contains location information for the handset, a unique number for the handset SIM card (encrypted by the carrier to eliminate user personal information), timestamp information, etc. The positioning information is typically a base station number. By associating the base station location data table, latitude and longitude information can be obtained, as an example.
And the user ID in the table is a result obtained after the mobile phone number is encrypted. And Lac and ci are the numbers of the large area and the small area of the mobile phone base station, and the position identification of the base station can be obtained through the combination of the large area base station and the small area base station. The operator will register and maintain the location of the base station. Through the association of the base station numbers and the base station position table, the position information in the form of longitude and latitude can be obtained. The mobile phone signaling data has space-time continuity.
The timestamp is a recording time field in the sample table, and includes information of year, month, day, hour, minute and second.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for extracting travel parking points by using mobile phone signaling data, which is used for identifying the parking points by setting discrimination and screening conditions by combining travel space-time characteristics on the basis of cleaned data.
The technical scheme adopted by the invention is as follows: a method for extracting travel stop points by using mobile phone signaling data comprises the following steps:
1) grouping all the mobile phone signaling data of the set time acquired by a mobile communication operator according to different users;
2) respectively extracting mobile phone signaling of each mobile phone user, and sequencing mobile phone signaling data of each mobile phone user according to the time stamp to obtain a base station and a base station position sequence of each mobile phone user which are sequenced according to time;
3) respectively merging the mobile phone signaling data sequenced by each mobile phone user:
4) minimum residence time t (i)out-t(i)inGreater than or equal to 10minSignalling position, grouped as set C2, minimum dwell time t (i)out-t(i)inA grouping C3 of greater than 1min and less than 10 min;
5) finding out possible stopping points in the set C3 according to the accurately judged speed relation between the stopping points;
6) taking the signaling positions judged to be the stopping points in the set C3 from the set C3 and sorting the signaling positions into a set C2, so as to update a stopping position set C2 and a stopping position set C3; the signaling positions in the staying position set C2 are according to the starting time t (j) of the signaling positionsinSorting is carried out;
7) returning to step 5) until the number of signaling positions in the set C2 and the set C3 no longer changes, and finally obtaining the set C2 as the full parking point set.
Step 1) grouping is carried out according to the unique identification codes of the mobile phone users, and the mobile phone signaling data with the same identification code are grouped into the same group.
The step 3) comprises the following steps:
(1) if the continuous multiple signaling occurs at the same position, combining the multiple signaling, and reserving the timestamp of the first signaling and the timestamp of the last signaling in the continuous multiple signaling;
(2) if the position of a signaling is not consistent with the positions of the signaling adjacent to the left and right of the signaling and cannot be combined with the adjacent signaling, setting the starting timestamp and the ending timestamp of the signaling to be consistent and taking the timestamp as the time of the signaling;
(3) the finally formed combined data format is a mobile phone number code, a signaling position i, a starting time t (i)inEnd time t (i)outCalculating the time of stay of the user at the signaling position i to be at least t (i)out-t(i)inThe longest residence time is t (i +1)in-t(i-1)out(ii) a All time-ordered sets of signaling positions are denoted as set C1.
Step 5) comprises the following steps:
(1) setting the starting time of the signaling positions ordered as j in the set C2 to be t (j)inEnd time t (j)out;
(2) Taking a signaling position k in the set C3, finding a previous record and a next record which are closest to the signaling position j in terms of time in the staying position set C2, and recording the previous record and the next record as a signaling position j and a signaling position j + 1;
(3) calculating distance dis (j, k) according to longitude and latitude information between the signaling position k and the signaling position j, wherein the longest in-transit travel time is t (k)in-t(j)out(ii) a The minimum moving speed is dis (j, k)/(t (k))in-t(j)out) (ii) a Similarly, the lowest moving velocity for signaling position k and signaling position j +1 is calculated as dis (k, j +1)/(t (j +1)in-t(k)out);
(4) Setting the walking speed to be 4km/h, taking the speed of 3km/h as a threshold value, if the speeds of the signaling position k, the signaling position j and the signaling position j +1 in the set C3 are all more than 3km/h, judging the signaling position k to be a transit point, and otherwise, judging the signaling position k to be a parking point.
The method for extracting the travel stopping points by using the mobile phone signaling data fully utilizes the characteristics of the mobile phone signaling data, firstly judges the stable stopping points by using the shortest stopping time condition, synthesizes the speed and position relationship between the possible stopping points and the front stopping points and the rear stopping points, selects more stopping points and reduces selection omission.
Drawings
FIG. 1 is a diagram of signaling data space-time trajectories;
FIG. 2a is a schematic view of point B not being identified as an active docking point;
FIG. 2B is a schematic diagram of a point B being identified as an active docking point due to a long distance trip;
fig. 3 is a schematic diagram of individual one-day signaling data docking point identification.
Detailed Description
The following describes a method for extracting travel stop points by using mobile phone signaling data according to embodiments and accompanying drawings in detail.
The main content of the research of the invention is how to obtain the parking points of residents through the mobile phone signaling data so as to obtain important information of the travel activity rules, the occupation space distribution characteristics and the like of urban residents.
And on the basis of the cleaned data, the stop points are identified by combining travel space-time characteristics and setting discrimination screening conditions. The method provides 4 steps of screening conditions, namely residence time screening, appearance frequency screening, speed reduction screening and too far bypassing screening, and the stopping point can be identified when any condition is met.
According to the method, travel behavior characteristics of time and space dimensions are combined, and the identification conditions of the parking points are set, so that the positions of the parking points are accurately excavated; finally, the reasonability and the feasibility of the method are verified through comparative analysis of the actual travel track.
The invention discloses a method for extracting travel stop points by using mobile phone signaling data, which comprises the following steps:
1) all the mobile phone signaling data of the set time acquired by the mobile communication operator are grouped according to different users, so that the parking point of each user can be conveniently lifted in the next step; specifically, grouping is carried out according to the unique identification codes of mobile phone users, and the mobile phone signaling data with the same identification code are grouped into the same group.
2) Respectively extracting mobile phone signaling of each mobile phone user, and sequencing mobile phone signaling data of each mobile phone user according to the time stamp to obtain a base station and a base station position sequence of each mobile phone user which are sequenced according to time;
3) respectively merging the mobile phone signaling data sequenced by each mobile phone user: the method comprises the following steps:
(1) if the continuous multiple signaling occurs at the same position, combining the multiple signaling, and reserving the timestamp of the first signaling and the timestamp of the last signaling in the continuous multiple signaling;
(2) if the position of a signaling is not consistent with the positions of the signaling adjacent to the left and right of the signaling and cannot be combined with the adjacent signaling, setting the starting timestamp and the ending timestamp of the signaling to be consistent and taking the timestamp as the time of the signaling;
(3) the final formed combined data format is mobile phone number code, signaling position i (i represents the ith signaling position in the sequence order), and starting time t (i)inEnd time t (i)outAs shown in the figure1, the time of user staying at the signaling position i can be calculated to be at least t (i)out-t(i)inThe longest residence time is t (i +1)in-t(i-1)out(ii) a All time-ordered sets of signaling positions are denoted as set C1.
4) If some activities are performed in some locations there must be a certain amount of time remaining. However, from the analysis of the signaling data in step 3), it can be seen that for a certain location, the shortest possible dwell time and the greatest possible dwell time can be calculated, the dwell times of which fluctuate within a certain range. However, if the minimum dwell time exceeds a certain limit, it can be determined that dwell has occurred. However, how to select the threshold has a problem, because some conditions in the way, such as waiting at a bus station or traffic jam in the way, can also cause a certain time of stay, the invention is obtained by carrying out statistical analysis according to the numerical values obtained in the travel survey report of Tianjin residents in 2017: minimum residence time t (i)out-t(i)inSignalling position greater than or equal to 10min, grouped as set C2, minimum dwell time t (i)out-t(i)inA grouping C3 of greater than 1min and less than 10 min;
5) finding out possible stopping points in the set C3 according to the accurately judged speed relation between the stopping points; the method comprises the following steps:
(1) setting the starting time of the signaling positions ordered as j in the set C2 to be t (j)inEnd time t (j)out;
(2) Taking a signaling position k in the set C3, finding a previous record and a next record which are closest to the signaling position j in terms of time in the staying position set C2, and recording the previous record and the next record as a signaling position j and a signaling position j + 1;
(3) calculating distance dis (j, k) according to longitude and latitude information between the signaling position k and the signaling position j, wherein the longest in-transit travel time is t (k)in-t(j)out(ii) a The minimum moving speed is dis (j, k)/(t (k))in-t(j)out) (ii) a Similarly, the lowest moving velocity for signaling position k and signaling position j +1 is calculated as dis (k, j +1)/(t (j +1)in-t(k)out);
(4) Setting the walking speed to be 4km/h, considering that a general path is not a straight-line path, calculating the speed according to the straight-line distance and generating overestimation on the speed, therefore, taking the speed of 3km/h as a threshold value, if the speeds of a signaling position k, a signaling position j and a signaling position j +1 in a set C3 are all more than 3km/h, judging the signaling position k to be a transit point, and otherwise, judging the signaling position k to be a parking point.
6) Taking the signaling positions judged to be the stopping points in the set C3 from the set C3 and sorting the signaling positions into a set C2, so as to update a stopping position set C2 and a stopping position set C3; the signaling positions in the staying position set C2 are according to the starting time t (j) of the signaling positionsinSorting is carried out;
7) returning to step 5) until the number of signaling positions in the set C2 and the set C3 no longer changes, and finally obtaining the set C2 as the full parking point set.
As shown in FIG. 2a and FIG. 2B, B is the midpoint of C3, if (d (A, B) + d (B, C))/d (A, C) > 1.3, the midpoint of C3 is added to C2, and finally C2 is the set of all parking points.
Examples are given below:
1. first, residence identification is carried out
And selecting the mobile phone of the user and the related time and space information thereof which appear in 7 consecutive days, selecting the first record and the last record every day according to the daily work and rest rule of the user, and performing spatial clustering on the corresponding space information. And calculating the number of days of the user appearing in the clustering place, setting a recognition condition, and recognizing the place as a residence if the number of days of the user appearing and the frequency of the user appearing in the clustering place both exceed a set threshold value.
2. Identification of work place
Selecting user records with the occurrence days being more than 5 days and identified living places, calculating the distance between each point position in the records and the living places, selecting the places with the distance being more than a set threshold value (more than 1km), and carrying out spatial clustering analysis. And counting the stay time at each clustering position and the number of days of each clustering place of each user, sequencing according to the number of days of appearance and the stay time, and identifying the clustering place with the longest number of days of appearance and the longest stay time as the working place.
3. Verification of the method for extracting travel stop points by using mobile phone signaling data
The signaling data parking points obtained by the method for identifying the parking points are compared with the actual travel track, the average success rate of the identification of the parking points is 75%, and the method has higher rationality and feasibility. The signaling data of a certain individual day is analyzed, 794 signaling data records are generated for a certain day trip of the individual, and the parking point 10 is identified and determined through the parking point mining technology, wherein the 9 is consistent with the position of the actual parking point, and all the actual parking points are accurately identified, as shown in fig. 3.