CN117332376B - Method and system for identifying commuter and mode based on mobile phone signaling data - Google Patents

Method and system for identifying commuter and mode based on mobile phone signaling data Download PDF

Info

Publication number
CN117332376B
CN117332376B CN202311635072.9A CN202311635072A CN117332376B CN 117332376 B CN117332376 B CN 117332376B CN 202311635072 A CN202311635072 A CN 202311635072A CN 117332376 B CN117332376 B CN 117332376B
Authority
CN
China
Prior art keywords
user
track
grid
index
commute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311635072.9A
Other languages
Chinese (zh)
Other versions
CN117332376A (en
Inventor
诸彤宇
刘子航
孙磊磊
李维淼
孙知洋
景昕
江勇
王冀彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
China Mobile Information Technology Co Ltd
Original Assignee
Beihang University
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, China Mobile Information Technology Co Ltd filed Critical Beihang University
Priority to CN202311635072.9A priority Critical patent/CN117332376B/en
Publication of CN117332376A publication Critical patent/CN117332376A/en
Application granted granted Critical
Publication of CN117332376B publication Critical patent/CN117332376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/26Discovering frequent patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention relates to a method and a system for identifying commuter and mode based on mobile phone signaling data, wherein the method comprises the following steps: step S1: acquiring signaling interaction data of a mobile phone and a base station; step S2: acquiring an overall process activity chain of a user according to signaling interaction data, performing travel semantic segmentation, and extracting commuting track sections of the user on duty and off duty, so that each commuting track section comprises and only comprises one travel mode; step S3: calculating a multi-day track similarity index of the commute track section, so as to obtain a user activity regularity index, and screening urban commuters through a preset threshold; step S4: and identifying the commuting mode of the urban commuter according to the route information and the recessive speed characteristics of the urban commuter. The method provided by the invention can accurately identify the commuter and the commuting mode thereof.

Description

Method and system for identifying commuter and mode based on mobile phone signaling data
Technical Field
The invention relates to the field of traffic planning, in particular to a method and a system for identifying commuter and modes based on mobile phone signaling data.
Background
The method accurately judges the travel mode of urban residents, has great significance for advanced traffic system control strategies such as traffic signal control, regional traffic planning, traffic demand analysis and the like, and further helps to relieve traffic jams. Identifying traffic patterns not only helps users select the appropriate traffic pattern in a particular journey, but also allows them to better understand their own lifestyle. For traffic management departments, the traffic mode is identified, which is helpful for mining the traveling behavior and social mode of residents, and further helps city managers to dredge traffic jam and allocate public traffic resources more effectively.
Research and practice for traditional commute crowd identification and commute mode discrimination is often based on the following three data: (1) traditional questionnaire data; (2) GPS track data, such as track data recorded by GPS receivers on partially authorized vehicles (e.g., taxis, buses, and shared bicycles); (3) Service data including location, such as travel orders or recorded data recorded by a cell phone application or traffic card. However, the above data suffer from several drawbacks:
1. the collection threshold is high, and a complicated manual questionnaire is required to be performed, or additional software and hardware support is required. Thus, it is difficult to form a large-scale dataset that covers millions of people in an urban area;
2. the data are not unbiased, can not completely cover the whole crowd, and can not cover travel records of the crowd in the whole day, for example, the track recorded by the GPS of the taxis can only cover the taxi data with irregular travel and larger activity range, and the travel records recorded by the traffic card can only record the behavior of the taxi when the taxi is taken in public transportation.
In contrast, the mobile handset interacts passively with the base station, and its signaling data can be collected anywhere there is base station signal coverage, not limited to specific support hardware or specific applications, so the data sources are relatively easy, and the amount of data obtained is one or even several orders of magnitude larger than the above-mentioned several data sources. Meanwhile, the travel behaviors of the crowd at almost all times can be covered. Thus, the mobile phone signaling is used as a research object, so that the commute behavior of urban people can be more comprehensively researched.
However, the signaling data does not include information on the travel mode of the person, and also does not include information on speed, acceleration, direction, and the like. In addition, the positioning accuracy of the signaling data (about 200 meters) is lower than that of the GPS (about 10 meters), and the data frequency is also lower (the data interval can be several hours). Thus, the signaling data cannot be directly applied to commute crowd identification and commute mode discrimination.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method and a system for identifying commuter and modes based on mobile phone signaling data.
The technical scheme of the invention is as follows: a method for identifying commuter and mode based on mobile phone signaling data comprises the following steps:
step S1: acquiring signaling interaction data of a mobile phone and a base station;
step S2: acquiring an overall process activity chain of a user according to the signaling interaction data, performing travel semantic segmentation, and extracting commuting track sections of the user on duty and off duty, so that each commuting track section comprises and only comprises one travel mode;
step S3: calculating a multi-day track similarity index of the commute track section so as to obtain a user activity regularity index, and screening urban commuters through a preset threshold value;
step S4: and identifying the commuting mode of the urban commuter according to the route information and the recessive speed characteristics of the urban commuter.
Compared with the prior art, the invention has the following advantages:
1. the invention discloses a method for identifying commuter and mode based on mobile phone signaling data, which aims at the problems of bias and data volume limitation of the existing research data and takes mobile phone signaling as a data source to obtain the whole-process travel data of the whole population; aiming at the problem that the existing track similarity calculation method cannot effectively represent space-time characteristics, the invention provides a daily commute track similarity index based on dynamic time regularity so as to measure the activity regularity of personnel and further identify urban commute personnel; aiming at the problem that signaling data lacks labels, the invention provides a track section travel mode identification method based on route information and hidden speed characteristics of commuters, which is used for identifying travel modes of commuters.
2. The track space-time coding method based on the relative position coding can strengthen the personalized characteristics of the user stay points and enhance the difference of different user tracks. Compared to conventional feature-based methods, longest subsequence-based methods, or absolute position coding, the coding does not require manual extraction of trajectory features, can take into account both temporal and spatial features of the user trajectory, and is sensitive to different personnel features.
3. The track similarity index provided by the invention has a certain space-time tolerance, can comprehensively consider slight dislocation in time and space, and has better space-time robustness compared with the traditional index.
4. The urban commute mode identification method provided by the invention can fully utilize public transportation station information, and simultaneously utilizes time information and space information to carry out probability estimation on each track section aiming at a specific public transportation line so as to identify a possible trip mode.
Drawings
FIG. 1 is a flow chart of a method for identifying commuter and mode based on mobile phone signaling data in an embodiment of the invention;
FIG. 2 is a schematic diagram of a user travel track in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a user stay point in an embodiment of the present invention;
FIG. 4 is a diagram illustrating relative position encoding according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating travel pattern recognition according to an embodiment of the present invention;
fig. 6 is a block diagram of a system for identifying a commuter and a mode based on mobile phone signaling data according to an embodiment of the present invention.
Detailed Description
The invention provides a method for identifying commuter and mode based on mobile phone signaling data, which can accurately identify the commuter and the commuter mode.
The present invention will be further described in detail below with reference to the accompanying drawings by way of specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent. The technical scheme of the invention is limited to legal use, and the acquired information obtains the agreement of the acquired person.
Example 1
As shown in fig. 1, the method for identifying commuter and mode based on mobile phone signaling data provided by the embodiment of the invention comprises the following steps:
step S1: acquiring signaling interaction data of a mobile phone and a base station;
step S2: acquiring an overall process activity chain of a user according to signaling interaction data, performing travel semantic segmentation, and extracting commuting track sections of the user on duty and off duty, so that each commuting track section comprises and only comprises one travel mode;
step S3: calculating a multi-day track similarity index of the commute track section, so as to obtain a user activity regularity index, and screening urban commuters through a preset threshold;
step S4: and identifying the commuting mode of the urban commuter according to the route information and the recessive speed characteristics of the urban commuter.
In one embodiment, step S1 described above: the method for acquiring the signaling interaction data of the mobile phone and the base station specifically comprises the following steps:
and extracting signaling interaction data of the mobile phone and the base station of the user in a certain time, and screening effective data in the signaling interaction data to serve as a data source.
In one embodiment, step S2 above: the method comprises the steps of acquiring an overall process activity chain of a user according to signaling interaction data, performing travel semantic segmentation, and extracting commuting track sections of the user on duty and off duty, wherein each commuting track section comprises and only comprises one travel mode, and specifically comprises the following steps:
step S21: according to the signaling interaction data, obtaining a signaling record data of the user and the base stationRepresenting user +.>At->Time to->Time and position in positionThe interaction occurs between the base stations; constructing a full-day trip chain of each user according to a signaling record set in one day, which is expressed asWherein->Representing a date;
it is generally considered that the interaction distance between the mobile device and the base station is less than 500 meters, and therefore, one piece of signaling record data roughly describes the current position of the user, and the whole-day trip chainThe travel track of the user for one day is roughly described.
Step S22: dividing a research area into grids with the length and the width of 1km according to longitude and latitude; accumulating the residence time of each user in the grid in different time periods respectively, so as to determine the working place and residence place of the user;
the conversion formula of longitude and latitude and the grid is as follows:
wherein,,/>respectively the abscissa serial number and the ordinate serial number of the grid;
the function converts the latitude difference value into the geodesic distance, < >>Representation of the position->Corresponding latitude;representing +.>Is the minimum of (2); />Representing the mesh size, in this embodiment taken
The function converts the longitude difference value into a geodesic distance; />Representation of the position->The corresponding longitude;representing +.>Is a minimum of (2).
Thereby obtaining the minimum value and the maximum value of all the grid ordinate sequence numbers in the research area as respectivelyAndthe same principle of the grid abscissa sequence numbers are respectively marked as +.>And->
Respectively accumulating each userThe residence time in each grid was two time periods during the day (8-18 points) and during the night (0-5 points and 22-24 points). For users->Its day (8-18 points) is in grid +.>The residence time in the reactor isWherein->To indicate a function. And thus user->The dwell time within all grids can be expressed as a two-dimensional matrix:
the possible workplace and residence of the user are selected according to the residence time of the user in the daytime and at night. The specific method comprises the following steps: and selecting the daytime longest residence points of the user on different workdays as the residence of the user on the same day. If the user has different habitats in a plurality of workdays, the voting mode is adopted, and the habitat with the highest occurrence frequency is selected as the final workplace. Similarly, the residence time of the user at night is obtained.
Fig. 2 is a partial travel track of the user 3148801 in the morning of 2023, 3 and 1, and fig. 3 is a distribution (local) of residence time of the user in the morning of 2023, 3 and more workdays, wherein the abscissa corresponds to the east-west direction and the north-south direction, the darker the color is the longer the residence time of the user in the grid, and the lighter the grey of the ground color is the residence time of the user in the grid is near or equal to zero. There are three more pronounced dwell points A, E and J in fig. 3. From fig. 2 and 3, an explanation of part of the travel semantics of the user can be obtained:
1) The user's residence time near site a is long (corresponding to point a of fig. 3), and it is the user's departure point, possibly corresponding to his night residence;
2) The user eventually arrives near site J (corresponding to point J of fig. 3), which may correspond to his workplace;
3) The user has a short stay in the vicinity of location E during travel from location a to location J (corresponding to point E in fig. 3), possibly suggesting that the user has a transfer behavior of the vehicle there (an important subway transfer station in the vicinity of location E), and possibly corresponding to a change of the vehicle.
Step S23: from the workplace and residence place as the dividing pointDividing an on-duty commuting track and an off-duty commuting track, wherein the on-duty commuting track is from a residence to a working place, and the off-duty commuting track is from the working place to the residence;
the trajectory splitting algorithm pseudocode is as follows:
step S24: calculating the residence time index and the local density index of the grids at different times in a sliding window mode, and weighting to obtain the importance degree of the grids; taking a plurality of grids with highest importance as characteristic stay points, dividing the commuting track on duty and the commuting track off duty to obtain a plurality of commuting track sections; each commute track section obtained by the method is assumed to contain and only contains one trip mode, and specifically comprises the following steps:
step S241: the residence time of the user in each grid is accumulated in the commuting tracks of the user on duty and off duty respectively. Judging whether the point is a stay point in the way of commuting according to two indexes of the stay time of the grid and the local density of the grid;
wherein, the grid residence time is obtained according to the calculation mode of the step S22;
the calculation method of the local density of the grid is as follows:
wherein,representation and grid->Immediately adjacent 8 other grids. Function->The information gain of each grid compared to its neighbor grid is intended to be calculated: if the grid staysIs significantly different from the neighbor grid, thenThe entropy of the 9 grids would be more than just consider +.>The information entropy of these 8 grids is greater. />As an average function, it will ∈>Average the values of (2) to cancel out the entropy of the two previous and subsequent calculations>Entropy difference caused by different grid numbers; />For a small value, 0.0001 is taken in the present example.
Step S242: judging the importance degree of the grid in the user commute course by combining the grid stay time index and the grid local density indexFormalized as the following formula:
wherein,the value of the embodiment of the invention is 1 for the adjustable super parameter;
step S243: selecting importance index in multiple workdaysHighest +.>The grid is used as a feature stay point in the way of the commute, which together with the residence and the workplace forms the +.>Individual characteristic dwell pointsThe method comprises the steps of carrying out a first treatment on the surface of the Taking the characteristic stay points as candidate points for switching the travel modes of the user;
step S244: based on these candidate points, the user's commute track for work up and down is further divided into commute track segments such that each commute track segment contains and only contains one travel pattern.
In one embodiment, the step S3: calculating a multi-day track similarity index of the commute track section so as to obtain a user activity regularity index, and screening urban commuters through a preset threshold value, wherein the method specifically comprises the following steps:
step S31: by usersThe feature stay points are used as anchor points, euclidean distance between the current position of the user and each anchor point is calculated, and a relative position code of the current position is formed; thereby representing the position vector of each moment of the user as a relative position matrixWherein->Representing the duration of the track segment;
setting the current userThe position of the moment is +.>Characteristic retention point is->The position code of the current moment is expressed asWherein->Refers to the 2 norm. Furthermore, the position vectors of the user at each moment constitute a two-dimensional matrix +.>
Compared with a feature-based method, the method does not need to manually extract track features; the encoding may take into account the temporal characteristics of the user trajectory compared to the method based on the longest subsequence. Compared with absolute position coding, relative position coding limits the studied area to the range of motion of the user, so that when the user moves, the change amount of the position coding is more obvious compared with the absolute value. In addition, when the user reaches his characteristic dwell point, i.e.The position code exhibits a specific value of 0. Therefore, compared with a method based on absolute position coding, the track space-time coding method based on relative position coding can strengthen the personalized characteristics of the user stay points and strengthen the difference of different user tracks.
Fig. 4 shows the relative position code of the user 3148801 on the morning trip track at 3 months 1 of 2023. The abscissa is time, and the ordinate represents the trend of the distance from the current moment of the user to the stop point of a certain feature over time. Two feature stops near the 1 st and 3 rd lines correspond to the place a, one feature stop near the 2 nd line corresponds to the place J, and two feature stops near the 4 th and 5 th lines correspond to the place E.
Step S32: track similarity index is calculated according to the following formula:
wherein,and->Representing the relative position coding matrix of two different workdays of the same user respectively;is the maximum normalization function of the vector; />DTW distance representing a high-dimensional vector; />Is the length of the matrix;
the invention provides a daily commute track similarity index based on dynamic time warping, which is based on Dynamic Time Warping (DTW) and is used for measuring the similarity of two-day activity tracks by combining normalization means. Compared with the measurement index based on cosine similarity, the track similarity index has a certain space-time tolerance, and can comprehensively consider slight dislocation in time and space. For example, when the departure time or arrival time and other track time features of the commuter or space features such as routes are not strictly identical in a plurality of working days, the index can select the best global matching of different time stamps through a dynamic programming algorithm, so that the feature points of the travel track can be correctly identified, and the effectiveness of travel feature similarity measurement is improved.
Step S33: calculating track similarity index of every two workdays of userThe activity regularity index of the user is then obtained by averaging, and if it is greater than a preset threshold, the user is considered to be a city commuter.
The method and the system calculate the track similarity index of each user on every two workdays, and then average the track similarity index to obtain the activity regularity index of each user. The attributes of the user are determined using a preset threshold value of 0.9 above which the user is considered to be a city commuter.
In one embodiment, step S4 above: identifying the commuting mode of the urban commuter according to the route information and the recessive speed characteristics of the urban commuter, wherein the method specifically comprises the following steps:
step S41: respectively acquiring longitude and latitude positions and corresponding lines of subway stations and bus stations in the research area from the public data set; searching subway stations and bus stations in 500 meters nearby for each mobile base station, and giving weight to each station according to the Euclidean distance between the mobile base station and the bus station and square attenuation; and further, calculating a correlation index between each mobile base station and the corresponding line according to the line information of the corresponding station, which specifically includes:
step S411: site searching and matching of the track points. And collecting the longitude and latitude positions and the corresponding lines of subway stations in the research area from the public data set, and collecting the longitude and latitude positions and the corresponding lines of bus stations. The information of each subway station and each bus station is respectively expressed asAnd->Wherein->And->Respectively refer to longitude and latitude, and->Refers to a subway line passing through the subway station or a bus line passing through the bus station. />And->Is a complete set of all subway lines and all bus lines.
Step S412: retrieving distance base station from whole subway station and bus stationThe stations with the Euclidean distance less than 500 meters are marked as candidate stations:
wherein,representing location point +.>And location point->Is a geodesic distance; each candidate site->By weight->Indicating that it is +.>Is far and near. The magnitude of the weight is inversely proportional to the square of the distance, and can be expressed as:
wherein,in order to control the superparameter of the weight distribution, the value of the embodiment of the invention is 25000.
Fig. 5 shows an embodiment in which the number (e.g., 3) represents the base station, the lowercase greek letter (e.g) Indicating subway stations, lowercase English letters (e.g.)>) Indicating bus stops, capital greek letters (e.g +.>) Indicating subway lines, capital English letters (e.g. & lt + & gt>) Representing a bus route. The user interacts with the base stations shown in the figure in turn (in the order of base stations 1-2-3-2-4-9-7-5-6-8) during travel. The lower left table shows the specific interaction times and the weights of the base station and public transportation sites, and the lower right table shows the routes partly through these sites. For example, the candidate station of base station 2 is subway station +.>And bus stop->The weights are 0.3 and 0.9, respectively.
Step S413: according to the route information of each subway/bus station, calculating each route and mobile base stationIs used for the correlation coefficient of the (c). The candidate lines may be expressed as:
wherein,for the distance base station->Subway stations and bus stations within 500 meters;representing the passage of subway/bus stops->Subway lines or bus lines;
candidate lineAnd base station->Correlation coefficient of->Can be expressed as:
wherein,to indicate the function, when->And 1 if not, and 0 if not.
For example, in fig. 5, a subway station(weight to base station 2 is 0.3) there is subway +.>Line passing, bus station->(weight with base station 2 is 0.9) there is bus +.>Road-passing, so that base station 2 is located with the subway +.>The correlation of the line is 0.3, the correlation degree with the public road is 0.9, and the correlation degree with other lines is 0.
Step S42: for one commute track section of the user, the correlation index of each path point and each line can be obtained through the step S41; weighting again by the stay time of the user at the approach point to obtain the correlation index of the commute track section to each line; and finally, selecting a route with the highest correlation index by a voting mode, and respectively taking the route and the travel mode of the user in the commute track section by combining with the implicit speed characteristics of the route, wherein the method specifically comprises the following steps:
for a certain track segment of the userWherein the route points are in turnThe stay time is +.>. For each of the pathway points->Obtaining a candidate line set corresponding to the path base station according to step S413
And its correlation coefficient->Candidate lines for calculating track segmentsAnd its correlation coefficient->The following formula is shown:
to sum up, for a certain track segmentCandidate lines can be obtained>Each candidate line->Correlation coefficient of->
Taking fig. 5 as an example, when the user interacts with the base station 2 for 20+2=22 seconds, the correlation degree between the track segment and the subway line is 0.3×22=6.6, and the correlation degree between the track segment and the subway line is 0.9×22=19.8; by analogy and summation, the route with the highest correlation degree in the track section of the user is known to be a subwayLine (total correlation coefficient 400.8), followed by bus +.>Road (total correlation coefficient 360.8). It is thus inferred that the travel pattern used by the user in this track segment is subway.
In addition, the travel speed characteristics of the personnel are inferred through the stay time of the user on each grid so as to assist in identifying the travel mode. In the 1km grids with continuous tracks, if the time of a user passing through each grid is within 2 minutes and the residence time difference of adjacent grids is within 50%, the grids are considered to be in a high-speed state, and the probability that the grids possibly travel by taking subways is higher; the shorter the path time, the higher the probability linearity. Otherwise, if the residence time is 3 minutes or more and the distribution is uneven, the traffic mode is considered to be related to the highway, and the bus or the self-driving is possible.
Finally, selecting a candidate line with the highest correlation coefficient by votingAnd determining a public transportation line adopted by the user in the track section, and further obtaining candidate commuting modes of the user. And deducing the commuting mode by combining the implicit speed characteristics.
If the track of the user is matched with the public transportation and the correlation coefficient is low, the person is considered to take a driving mode (including network taxi, taxi and self-driving).
Example two
As shown in fig. 6, the embodiment of the invention provides a system for identifying commuter and mode based on mobile phone signaling data, which comprises the following modules:
the signaling data acquisition module 51 is configured to acquire signaling interaction data of the mobile phone and the base station;
the commute track section acquisition module 52 is configured to acquire a whole process activity chain of a user according to the signaling interaction data, extract commute track sections of the user on duty and off duty, and perform travel semantic segmentation, so that each commute track section includes and only includes one travel mode;
the identifying commuter module 53 is configured to calculate a multi-day track similarity index of the commuter track segment, thereby obtaining a user activity regularity index, and screening urban commuters through a preset threshold;
an identify commuter module 54 for identifying a commuter of the city based on the route information and the implicit speed characteristics of the city commuter.
The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalents and modifications that do not depart from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (2)

1. A method for identifying commuter and mode based on mobile phone signaling data is characterized by comprising the following steps:
step S1: acquiring signaling interaction data of a mobile phone and a base station;
step S2: the whole process activity chain of the user is obtained according to the signaling interaction data, travel semantic segmentation is carried out, and commuting track sections of the user and the user are extracted, so that each commuting track section comprises and only comprises one travel mode, and the method specifically comprises the following steps:
step S21: according to the signaling interaction data, a piece of signaling record data of the user and the base station is obtainedRepresenting user +.>At->Time to->Time and position in positionThe interaction occurs between the base stations; constructing a full-day trip chain of each user according to a signaling record set in one day, which is expressed asWherein->Representing a date;
step S22: dividing a research area into grids with the length and the width of 1km according to longitude and latitude, and marking the grids asThe method comprises the steps of carrying out a first treatment on the surface of the Accumulating each user->At different time periodsIn the grid->Is +.>Thereby determining the workplace and residence of the user;
step S23: taking the workplace and the residence place as dividing points fromDividing an on-duty commute track and an off-duty commute track, wherein the on-duty commute track is from a residence to a working place, and the off-duty commute track is from the working place to the residence;
step S24: computing different time periods in a grid in a sliding window mannerThe residence time index and the local density index of the grid are weighted to obtain the importance degree of the grid; taking a plurality of grids with highest importance as characteristic stay points, dividing the commuting track on duty and the commuting track off duty to obtain a plurality of commuting track sections; each commute track section obtained by the method is assumed to contain and only contains one trip mode;
the calculation method of the grid local density comprises the following steps:
wherein,representing a grid->Representation and grid->Directly adjacent 8 other grids; function ofFor calculating the information gain of each grid compared to its neighbor grid +.>As an average function +.>Is a tiny value;
step S3: calculating a multi-day track similarity index of the commute track section so as to obtain a user activity regularity index, and screening urban commuters through a preset threshold, wherein the method specifically comprises the following steps of:
step S31: selecting a userThe feature stay points are used as anchor points, euclidean distance between the current position of the user and each anchor point is calculated, and a relative position code of the current position is formed; thereby representing the position vector of the user at each moment as a relative position matrix +.>Wherein->Representing the duration of the track segment;
step S32: calculating the track similarity index according to the following formula
Wherein,and->Representing the relative position coding matrix of two different workdays of the same user respectively;is the maximum normalization function of the vector; />DTW distance representing a high-dimensional vector; />Is the length of the matrix;
step S33: calculating track similarity index of every two workdays of userThen, obtaining an activity regularity index of the user through averaging, and if the activity regularity index is larger than a preset threshold value, considering the user as a city commuter;
step S4: identifying the commuting mode of the urban commuter according to the route information and the recessive speed characteristics of the urban commuter, wherein the method specifically comprises the following steps:
step S41: respectively acquiring longitude and latitude positions and corresponding lines of subway stations and bus stations in a research area from the public data set; for each mobile base station, find its vicinitySubway stations and bus stations in the meter, and giving weight to each station according to the Euclidean distance between the mobile base station and the public transportation station by square attenuation; further, according to the line information of the corresponding station, calculating the correlation index of each mobile base station and the corresponding line;
step S42: for one commute track section of the user, obtaining a correlation index of each path point and each line through the step S41; weighting again by the stay time of the user at the approach point to obtain the correlation index of the commute track section to each line; and finally, selecting the route with the highest correlation index by voting, and respectively taking the implicit speed characteristics of the route as the travel route and travel mode adopted by the user in the commute track section.
2. The utility model provides a mobile phone signaling data-based commuter and mode identification system which is characterized by comprising the following modules:
the signaling data acquisition module is used for acquiring signaling interaction data of the mobile phone and the base station;
the module for acquiring the commute track segments is used for acquiring the whole process activity chain of the user according to the signaling interaction data, extracting the commute track segments of the user on duty and off duty and carrying out travel semantic segmentation so that each commute track segment contains and only contains one travel mode, and specifically comprises the following steps:
step S21: according to the signaling interaction data, a piece of signaling record data of the user and the base station is obtainedRepresenting user +.>At->Time to->Time and position in positionThe interaction occurs between the base stations; constructing a full-day trip chain of each user according to a signaling record set in one day, which is expressed asWherein->Representing a date;
step S22: will be ground intoDividing the study area into grids with 1km length and width according to longitude and latitude, and marking asThe method comprises the steps of carrying out a first treatment on the surface of the Respectively accumulating each userIn different time periods in the grid->Is +.>Thereby determining the workplace and residence of the user;
step S23: taking the workplace and the residence place as dividing points fromDividing an on-duty commute track and an off-duty commute track, wherein the on-duty commute track is from a residence to a working place, and the off-duty commute track is from the working place to the residence;
step S24: computing different time periods in a grid in a sliding window mannerThe residence time index and the local density index of the grid are weighted to obtain the importance degree of the grid; taking a plurality of grids with highest importance as characteristic stay points, dividing the commuting track on duty and the commuting track off duty to obtain a plurality of commuting track sections; each commute track section obtained by the method is assumed to contain and only contains one trip mode;
the calculation method of the grid local density comprises the following steps:
wherein,representing a grid->Representation and grid->Directly adjacent 8 other grids; function->For calculating the information gain of each grid compared to its neighbor grid +.>As an average function +.>Is a tiny value;
the identification commuter module is used for calculating the multi-day track similarity index of the commuter track section so as to obtain a user activity regularity index, and the urban commuter is screened out through a preset threshold, and specifically comprises the following steps:
step S31: selecting a userThe feature stay points are used as anchor points, euclidean distance between the current position of the user and each anchor point is calculated, and a relative position code of the current position is formed; thereby representing the position vector of the user at each moment as a relative position matrix +.>Wherein->Representing the duration of the track segment;
step S32: calculating the track similarity index according to the following formula
Wherein,and->Representing the relative position coding matrix of two different workdays of the same user respectively;is the maximum normalization function of the vector; />DTW distance representing a high-dimensional vector; />Is the length of the matrix;
step S33: calculating track similarity index of every two workdays of userThen, obtaining an activity regularity index of the user through averaging, and if the activity regularity index is larger than a preset threshold value, considering the user as a city commuter;
the communication mode identifying module is used for identifying the communication mode of the urban commuter according to the route information and the recessive speed characteristics of the urban commuter, and specifically comprises the following steps:
step S41: respectively acquiring longitude and latitude positions and corresponding lines of subway stations and bus stations in a research area from the public data set; for each mobile base station, find its vicinitySubway stations and bus stations in the meter, and giving weight to each station according to the Euclidean distance between the mobile base station and the public transportation station by square attenuation; further according to the line of the corresponding siteInformation, calculating the correlation index of each mobile base station and the corresponding line;
step S42: for one commute track section of the user, obtaining a correlation index of each path point and each line through the step S41; weighting again by the stay time of the user at the approach point to obtain the correlation index of the commute track section to each line; and finally, selecting the route with the highest correlation index by voting, and respectively taking the implicit speed characteristics of the route as the travel route and travel mode adopted by the user in the commute track section.
CN202311635072.9A 2023-12-01 2023-12-01 Method and system for identifying commuter and mode based on mobile phone signaling data Active CN117332376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311635072.9A CN117332376B (en) 2023-12-01 2023-12-01 Method and system for identifying commuter and mode based on mobile phone signaling data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311635072.9A CN117332376B (en) 2023-12-01 2023-12-01 Method and system for identifying commuter and mode based on mobile phone signaling data

Publications (2)

Publication Number Publication Date
CN117332376A CN117332376A (en) 2024-01-02
CN117332376B true CN117332376B (en) 2024-02-27

Family

ID=89293916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311635072.9A Active CN117332376B (en) 2023-12-01 2023-12-01 Method and system for identifying commuter and mode based on mobile phone signaling data

Country Status (1)

Country Link
CN (1) CN117332376B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038168A (en) * 2016-02-03 2017-08-11 中兴通讯股份有限公司 A kind of user's commuting track management method, apparatus and system
CN110473132A (en) * 2019-08-27 2019-11-19 上海云砥信息科技有限公司 Balance evaluation method is lived in a kind of region duty based on mobile data
CN112637781A (en) * 2020-11-23 2021-04-09 广州大学 User traffic mode judging method based on base station track
CN116070033A (en) * 2023-02-09 2023-05-05 东南大学 Novel shared public transportation transfer demand estimation method based on mobile phone signaling data
CN116233823A (en) * 2023-05-10 2023-06-06 深圳市城市交通规划设计研究中心股份有限公司 Identification method of cross-city commute ring, electronic equipment and storage medium
CN116432988A (en) * 2023-06-12 2023-07-14 青岛精锐机械制造有限公司 Intelligent management method, medium and equipment for valve production process data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596202B (en) * 2018-03-08 2020-04-03 清华大学 Method for calculating personal commuting time based on mobile terminal GPS positioning data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038168A (en) * 2016-02-03 2017-08-11 中兴通讯股份有限公司 A kind of user's commuting track management method, apparatus and system
CN110473132A (en) * 2019-08-27 2019-11-19 上海云砥信息科技有限公司 Balance evaluation method is lived in a kind of region duty based on mobile data
CN112637781A (en) * 2020-11-23 2021-04-09 广州大学 User traffic mode judging method based on base station track
CN116070033A (en) * 2023-02-09 2023-05-05 东南大学 Novel shared public transportation transfer demand estimation method based on mobile phone signaling data
CN116233823A (en) * 2023-05-10 2023-06-06 深圳市城市交通规划设计研究中心股份有限公司 Identification method of cross-city commute ring, electronic equipment and storage medium
CN116432988A (en) * 2023-06-12 2023-07-14 青岛精锐机械制造有限公司 Intelligent management method, medium and equipment for valve production process data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Unveiling large-scale commuting patterns based on mobile phone cellular network data;Amnir Hadachi 等;《Journal of Transport Geography》;全文 *
基于人群出行行为轨迹的城市功能区识别;凌鹏 等;《计算机工程》;第48卷(第7期);全文 *

Also Published As

Publication number Publication date
CN117332376A (en) 2024-01-02

Similar Documents

Publication Publication Date Title
CN110505583B (en) Trajectory matching method based on bayonet data and signaling data
Xiao et al. Detecting travel modes using rule-based classification system and Gaussian process classifier
CN110414732B (en) Travel future trajectory prediction method and device, storage medium and electronic equipment
Duan et al. Prediction of city-scale dynamic taxi origin-destination flows using a hybrid deep neural network combined with travel time
Iqbal et al. Privacy implications of automated GPS tracking and profiling
CN108629978A (en) A kind of traffic trajectory predictions method based on higher-dimension road network and Recognition with Recurrent Neural Network
CN111862606B (en) Illegal operating vehicle identification method based on multi-source data
Assemi et al. Developing and validating a statistical model for travel mode identification on smartphones
CN107516417A (en) A kind of real-time highway flow estimation method for excavating spatial and temporal association
Burkhard et al. On the requirements on spatial accuracy and sampling rate for transport mode detection in view of a shift to passive signalling data
Semanjski et al. Crowdsourcing mobility insights–Reflection of attitude based segments on high resolution mobility behaviour data
Qian et al. Detecting taxi trajectory anomaly based on spatio-temporal relations
CN113888867B (en) Parking space recommendation method and system based on LSTM (least squares) position prediction
Ganji et al. Traffic volume prediction using aerial imagery and sparse data from road counts
Yao et al. Analysis of key commuting routes based on spatiotemporal trip chain
Wu et al. Recognizing real-time transfer patterns between metro and bus systems based on spatial–temporal constraints
Wang et al. Relationship between urban road traffic characteristics and road grade based on a time series clustering model: a case study in Nanjing, China
CN117332376B (en) Method and system for identifying commuter and mode based on mobile phone signaling data
Yao et al. Trip end identification based on spatial-temporal clustering algorithm using smartphone positioning data
Nair et al. Mapping bus and stream travel time using machine learning approaches
CN116092037B (en) Vehicle type identification method integrating track space-semantic features
ZHAO et al. Big data-driven residents’ travel mode choice: a research overview
CN110399919A (en) A kind of sparse track data interpolation reconstruction method of mankind&#39;s trip
Lwin et al. Identification of various transport modes and rail transit behaviors from mobile CDR data: A case of Yangon City
AT&T

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant