CN112836121A - Travel purpose identification method and system - Google Patents

Travel purpose identification method and system Download PDF

Info

Publication number
CN112836121A
CN112836121A CN202110118774.4A CN202110118774A CN112836121A CN 112836121 A CN112836121 A CN 112836121A CN 202110118774 A CN202110118774 A CN 202110118774A CN 112836121 A CN112836121 A CN 112836121A
Authority
CN
China
Prior art keywords
travel
user
interest
trip
track record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110118774.4A
Other languages
Chinese (zh)
Other versions
CN112836121B (en
Inventor
杜立群
刘斌
郑猛
张宇
吴丹婷
吕宜生
李志帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Municipal Institute Of City Planning & Design
Institute of Automation of Chinese Academy of Science
Original Assignee
Beijing Municipal Institute Of City Planning & Design
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Municipal Institute Of City Planning & Design, Institute of Automation of Chinese Academy of Science filed Critical Beijing Municipal Institute Of City Planning & Design
Priority to CN202110118774.4A priority Critical patent/CN112836121B/en
Publication of CN112836121A publication Critical patent/CN112836121A/en
Application granted granted Critical
Publication of CN112836121B publication Critical patent/CN112836121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Navigation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of travel behavior analysis, particularly relates to a travel purpose identification method and system, and aims to solve the problem that the efficiency and accuracy of identifying a user's travel purpose in the prior art are low. The invention comprises the following steps: carrying out resident duration acquisition and abnormal travel filtering on a travel track in the mobile phone signaling data; segmenting according to the age of the user; identifying a position of a user's position in a working phase; determining two travel purposes of working and working according to the position of a job; taking the rest travel tracks as interest travel tracks, and marking the types of interest points by using an online map; setting travel attributes for travel tracks, and establishing a probability graph model based on hidden Dirichlet distribution to represent a generation process of interest travel; defining the number of travel subjects, and solving the model by using a Gibbs sampling method; and determining the trip purpose according to the theme obtained by solving the model by a visual interpretation method. The invention realizes the efficient and high-accuracy travel purpose identification.

Description

Travel purpose identification method and system
Technical Field
The invention belongs to the field of travel behavior analysis, and particularly relates to a travel purpose identification method and system.
Background
Reasonable and comprehensive city planning and traffic management are basic guarantees for realizing city sustainable development and exerting city potential, and whether individual travel behaviors of residents can be accurately and effectively analyzed and mastered directly influences city traffic development strategies. With the rapid development of the urbanization process, the travel behaviors and modes become more and more complex. The traveling behaviors of residents are dynamic, the individuals are different, the same person can change along with the time, and the difficulty of social demographics is increased on a macroscopic level.
In recent years, Location Based Service (LBS) is widely used in various fields. The data source automatically collected by the system can accurately and meticulously capture the time information and the spatial position of the user in the trip, and can reflect the group migration characteristics of people or vehicles to a certain extent.
As an indispensable communication means for urban resident life, mobile phone signaling data becomes a research hotspot for urban planning and traffic management, and is an important data source for analyzing traffic travel at present. The process of acquiring the mobile phone signaling data is that an operator changes related network parameters and logic flows, as long as a mobile phone signal of a user is normal, the position of the user is positioned by weighted average of a plurality of base stations used by the mobile phone signal, and the base stations upload active positioning data of the position of the mobile phone user at regular intervals. In addition, track points of individual trips are aggregated according to the distribution density of the base station and a certain range around the user position, namely, whether the user moves or not in the range, the user stays for more than 30 minutes to be considered to generate a resident behavior, the user position is taken as a resident point, and if the user exceeds the range, the user is considered to generate the trip behavior. The movement of the user between two adjacent residence points is defined as one trip, and the first active point and the last active point in one day are also used as one trip, so that the detailed trip chain track, trip times and trip time of the user in one day can be obtained. Therefore, the mobile phone signaling data has strong space-time continuity, and the whole process of the user traffic trip can be observed from the mobile phone signaling data, which is incomparable with any other data source.
The mobile phone signaling data has large sample amount, objective and comprehensive data and no obvious tendency of sampling, and the characteristics of long periodicity, large workload, small sample amount and high cost of the traditional traffic survey are made up by analyzing the urban traffic operation characteristics by adopting the mobile phone signaling data. However, such automatically collected data often only contains travel trajectory information, but lacks necessary semantic information such as a travel mode and a travel activity type. If the semantic information contained in the mobile phone signaling data can be supplemented well, the data can be used for replacing the traditional methods of time-consuming and labor-consuming manual questionnaire survey and the like, so that the statistical efficiency is greatly improved. The existing travel purpose identification method based on the mobile phone signaling data has a plurality of problems: (1) according to the method, the travel purposes corresponding to the working travel and the home-returning travel of the user in the working age group are easily identified according to rules without considering the age group attributes of the user, but if the travels are not distinguished, the travel purposes are placed into a subsequent travel purpose identification model, so that the identification precision is interfered; (2) the matching of the online map interest point and the user trip purpose is not considered, so that the identification efficiency and accuracy of the user trip purpose are greatly influenced.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the prior art has low efficiency and accuracy in identifying a trip purpose of a user, the present invention provides a trip purpose identifying method, which includes the following steps:
step S10, obtaining travel track record in the user mobile phone signaling data; the travel track record comprises departure time, arrival time and longitude and latitude of a destination;
step S20, performing resident duration acquisition and abnormal travel filtering processing on the travel track record, and dividing the travel track record into an upper school stage, a working stage and a retirement stage according to age periods;
step S30, based on the travel track record of the user in the working stage, identifying the living position and the office position of the user in the working stage according to a heuristic rule;
step S40, determining two travel purposes of working and returning to the home of the user in the working phase based on the living position and the office position of the user in the working phase, and taking travel track records except the two travel purposes of working and returning to the home of the user in the working phase and all the track records of the user in the school phase and the retirement phase as interest travel track records;
step S50, determining the type of the interest point according to the destination longitude and latitude of each travel track in the interest travel track record by using an online map;
step S60, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and establishing a probability graph model based on the hidden Dirichlet distribution to express the generation process of the interest travel;
step S70, defining the number of user interest travel topics, solving the probability graph model by using a Gibbs sampling method, and obtaining travel topics of travel tracks in the interest travel track record;
and step S80, determining the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
In some preferred embodiments, the residence time is a time interval between every two trips of the user; the abnormal travel is a travel track record with the residence time longer than 15 hours.
In some preferred embodiments, the upper school stage is in the age range of 13-24 years, the working stage is in the age range of 25-59 years, and the retirement stage is in the age range of 60 years or more.
In some preferred embodiments, step S30 includes:
extracting the positions where the stay times at night within one month in the travel track record of the user in the working stage exceed the set times as the living positions of the user;
and extracting the position with the most visit times in the working time period within one month in the travel track record of the user in the working stage as the office position of the user.
In some preferred embodiments, step S40 includes:
in the travel track record of the user in the working stage, the starting point is the travel of the user living position and the destination is the user office position, and the travel purpose is working;
in the travel track record of the user in the working stage, the destination is the travel of the resident position of the user, and the travel purpose is returning home;
in the travel track record of the user with the travel purpose of returning home, the starting point is the travel of the office position of the user, and the travel purpose is returning home from work.
In some preferred embodiments, the point of interest is a land type of the point, and the land type includes catering services, educational institutions, corporate enterprises and medical services.
In some preferred embodiments, in step S60, a probabilistic graph model based on the hidden dirichlet distribution is established to represent the generation process of the interest trip, and the method includes:
step S61, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and dividing the travel generation process into a step of 'user → subject' and a step of 'subject → travel' based on the distribution of the hidden Dirichlet;
step S62, determining the trip theme distribution of the user, and obtaining each trip of the user according to the trip purpose sampling to obtain a probability map model based on the hidden Dirichlet distribution;
the travel theme distribution is the proportion distribution of different types of travel purposes in the interest travel track record of each user.
In some preferred embodiments, step S70 includes:
step S71, calculating the confusion of the probability map model under different subjects, and taking the number of subjects corresponding to the confusion being lower than a set threshold as the number of the user interest travel subjects;
the confusability Perplexity is:
Figure BDA0002921710110000041
the Likelihoods are likelihood functions of all data in the interest trip track record, and N is the total number of trip tracks in the interest trip track record;
and step S72, iteratively distributing the probabilities of the subjects to which all users travel through a Gibbs sampling method until the probability graph model converges when the confusion does not decrease any more, and obtaining the travel subjects of the travel tracks in the interest travel track record.
In some preferred embodiments, the likelihood function Likelihoods of all the data in the interest travel track record is:
Figure BDA0002921710110000051
wherein alpha, tau and lambda are hyper-parameters of a preset probability map model, pi and muz
Figure BDA0002921710110000052
ηzAnd thetazFor the distribution of travel attributes, z represents the subject of travel, tij、dij、sijAnd cijRespectively represents the arrival time of the jth trip of the user i, the age group of the user, the residence time andthe type of interest point of the destination, P is a probability value, represents a multiplication operation, M is the total number of users, N isiAnd K is the total number of the travel tracks of the user i, and the number of the travel subjects of the user.
In another aspect of the present invention, a travel purpose identification system is provided, which includes the following modules:
the data acquisition module is configured to acquire travel track records in the mobile phone signaling data of the user; the travel track record comprises departure time, arrival time and longitude and latitude of a destination;
the preprocessing module is configured to perform resident duration acquisition and abnormal travel filtering processing on the travel track record and divide the travel track record into an upper learning stage, a working stage and a retirement stage according to age intervals;
the position identification module is configured to identify the living position and the office position of the user in the working stage according to heuristic rules based on the travel track record of the user in the working stage;
the first identification module is configured to determine two travel purposes of working and returning home of the user in the working stage based on the living position and the office position of the user in the working stage;
the interest point type confirming module is configured to take travel track records of users in working stages and all travel track records of users in school stages and retirement stages as interest travel track records, and determine interest point types according to destination longitude and latitude of each travel track in the interest travel track records by using an online map;
the trip representation module is configured to express the interest trip track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as trip attributes, and establish a probability graph model based on the hidden Dirichlet distribution to represent the generation process of the interest trip;
a trip theme obtaining module configured to define the number of user interest trip themes, solve the probability map model by using a Gibbs sampling method, and obtain trip themes of each trip track in the interest trip track record;
and the second identification module is configured to determine the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
The invention has the beneficial effects that:
(1) the travel purpose identification method analyzes the travel time information and the spatial position of a user from mobile phone signaling data, considers the attributes of the age group of the user, determines the positions (the living position and the office position) of the users in the working age group (25-59 years old) according to frequency rules, and accurately judges the travel purposes of returning home and working according to the travel starting and ending points and the positions of the users; the travel track records of users in working stages except for two travel purposes of working and going home and all the travel track records of users in learning stages (0-24 years old) and retirement stages (more than 60 years old) are combined with an online map to estimate the travel purposes, the travel purpose identification method based on age classification can well complement the travel purpose information contained in the travel track data so as to replace traditional methods such as time-consuming and labor-consuming manual questionnaire survey and the like, and therefore the efficiency of subsequent travel purpose identification is greatly improved.
(2) According to the travel purpose identification method, the travel purposes of the users in the working age group, such as going to work and going home (going to home from work), are independently analyzed, interference on identification precision in a subsequent travel purpose identification model is avoided, and the travel purpose identification accuracy is improved.
(3) According to the trip purpose identification method, the online map interest points are matched with the trip purposes of the user, the trip record without the trip purpose tag value can be well identified by the hidden Dirichlet distribution-based method, and the trip purpose identification efficiency and accuracy of the user are improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart illustrating a travel purpose identification method according to the present invention;
FIG. 2 is a schematic diagram of a probability map model of the travel purpose identification method of the present invention;
fig. 3(a) -3 (j) are schematic diagrams of travel modes corresponding to 18 subjects after Gibbs sampling convergence according to an embodiment of the travel purpose identification method of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention relates to a travel purpose identification method, which comprises the following steps:
step S10, obtaining travel track record in the user mobile phone signaling data; the travel track record comprises departure time, arrival time and longitude and latitude of a destination;
step S20, performing resident duration acquisition and abnormal travel filtering processing on the travel track record, and dividing the travel track record into an upper school stage, a working stage and a retirement stage according to age periods;
step S30, based on the travel track record of the user in the working stage, identifying the living position and the office position of the user in the working stage according to a heuristic rule;
step S40, determining two travel purposes of working and returning to the home of the user in the working phase based on the living position and the office position of the user in the working phase, and taking travel track records except the two travel purposes of working and returning to the home of the user in the working phase and all the track records of the user in the school phase and the retirement phase as interest travel track records;
step S50, determining the type of the interest point according to the destination longitude and latitude of each travel track in the interest travel track record by using an online map;
step S60, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and establishing a probability graph model based on the hidden Dirichlet distribution to express the generation process of the interest travel;
step S70, defining the number of user interest travel topics, solving the probability graph model by using a Gibbs sampling method, and obtaining travel topics of travel tracks in the interest travel track record;
and step S80, determining the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
In order to more clearly explain the method for identifying travel purposes of the present invention, the following will describe each step in the embodiment of the present invention in detail with reference to fig. 1.
The travel purpose identification method according to the first embodiment of the present invention includes steps S10 to S80, and the steps are described in detail as follows:
step S10, obtaining travel track record in the user mobile phone signaling data; the travel track record comprises departure time, arrival time and longitude and latitude of a destination.
Since the mobile phone operator system adopts the real name system, the age group to which the user belongs can also be obtained, and the obtained travel record is shown in table 1:
TABLE 1
User id Age group Travel starting time End time of trip Destination latitude and longitude
165421 25-59 2020/1/1 7:29 2020/1/1 8:29 119.305763,40.104106
165421 25-59 2020/1/1 10:43 2020/1/1 12:36 119.307799,39.943485
165421 25-59 2020/1/1 21:38 2020/1/1 22:21 119.350475,39.941236
165421 25-59 2020/1/2 08:10 2020/1/2 08:14 119.334066,39.966397
And step S20, performing resident duration acquisition and abnormal travel filtering processing on the travel track record, and dividing the travel track record into an upper school stage, a working stage and a retirement stage according to the age range.
Acquiring the residence time: taking the time interval between every two trips of the user as the residence time attribute of the user at the destination;
and (3) filtering abnormal trips: travel records with residence times longer than 15 hours are filtered out.
The user group dividing process comprises the following steps: dividing user groups according to age attributes: dividing the user into three age groups of 13-24 years old, 25-59 years old and over 60 years old according to the age groups, and respectively corresponding to the school age, the work age and the retirement age.
The travel trajectory record after processing is shown in table 2:
TABLE 2
Figure BDA0002921710110000091
And step S30, based on the travel track record of the user in the working stage, identifying the living position and the office position of the user in the working stage according to a heuristic rule.
Extracting the positions where the stay times at night within one month in the travel track record of the user in the working stage exceed the set times as the living positions of the user;
and extracting the position with the most visit times in the working time period within one month in the travel track record of the user in the working stage as the office position of the user.
And step S40, determining two travel purposes of working and returning to the home of the user in the working stage based on the living position and the office position of the user in the working stage, and taking travel track records except the two travel purposes of working and returning to the home of the user in the working stage and all the track records of the user in the school stage and the retirement stage as interest travel track records.
In the travel track record of the user in the working stage, the starting point is the travel of the user living position and the destination is the user office position, and the travel purpose is working;
in the travel track record of the user in the working stage, the destination is the travel of the resident position of the user, and the travel purpose is returning home;
in the travel track record of the user with the travel purpose of returning home, the starting point is the travel of the office position of the user, and the travel purpose is returning home from work.
After the processing of this step, the travel record is shown in table 3:
TABLE 3
Figure BDA0002921710110000101
And step S50, determining the type of the interest point according to the destination longitude and latitude of each travel track in the interest travel track record by using an online map.
The Point of interest (POI) of the online map can effectively extract land area type information and can be obtained through an online searching and discovering service based on the position. And after the online map transmits the coordinates of the central point and the query radius, returning the POI and the type around the central point.
In one embodiment of the invention, the types of places of use include dining services, health services, residential areas, lifestyle services, entertainment facilities, corporate enterprises, shopping services, educational institutions, scenic spots, government agencies, and office buildings.
Table 4 is an example of a result of adding a POI type to an attribute of a trip purpose recorded for the interest trip track according to the resident ground latitude, each line of data represents an attribute corresponding to each trip of the user, and the data is used to estimate the trip purpose of the user.
TABLE 4
Figure BDA0002921710110000111
Step S60, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and establishing a probability graph model based on the hidden Dirichlet distribution to represent the generation process of the interest travel.
Step S61, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and dividing the travel generation process into a step of 'user → subject' and a step of 'subject → travel' based on the distribution of the hidden Dirichlet;
the travel attributes are selected based on the following principles:
(1) the attributes can distinguish the obviously different trips of the users;
(2) trips of similar trip purposes have similar attributes.
For example, trip a is: day 1 9, 8:00 departure for a 25-59 year old user, 9:00 arrival, stay in the company for 2 hours; the B trip is as follows: day 2, 9 months, a 25-59 year old user 8:10 departed, 9:00 arrived, stayed at the company for 2.2 hours; and the C trip is as follows: day 2, 9 months, a 25-59 year old user 8:10 departed, 9:00 arrived, and stayed in the food service for 0.5 hours. At this time, although the dates of travel of a and B are different, they are very similar, the possibility that the travel purpose belongs to work is high, and although most attributes of B and C are the same, the difference between the destination type and the residence time is large, and the travel purpose is obviously different.
In one embodiment of the present invention, four attributes are considered to represent the jth trip of the user i: reaches the time tijAge group d of userijDuration of residence sij(time the user stays at the destination), point of interest POI type c of the destinationijThen user i j trip wijRepresented by a quadruple represented by formula (1):
wij=(tij,dij,sij,cij) (1)
wherein, the arrival time tijAnd a residence time period sijIs a continuous variable, age group dijAnd POI type c of destinationijIs a discrete variable, user i j trip wijCorresponding trip eyesIs zij. The attribute of the interest travel track record at this time is shown in table 5:
TABLE 5
User id Age group Time of arrival Duration (hour) Destination POI type
165421 25-59 12:36 9.03 Life service
165421 25-59 08:14 2.56 Residential area
Step S62, determining the trip theme distribution of the user, and obtaining each trip of the user according to the trip purpose sampling to obtain a probability map model based on the hidden Dirichlet distribution;
the travel subjects are distributed in different types of travel purposes, such as visit travel purposes, entertainment travel purposes and the like, and are distributed in proportion in the interest travel track record of each user.
The proportion of travel purposes in different users' trips is different, for example, some users are more prone to trip due to visit, and some users trip to participate in entertainment activities. Thus using piiTo describe the distribution of the i travel destinations of the user, i.e. the proportion of each travel destination. And determining a theme corresponding to each trip of the user, namely a trip purpose z, according to all trip attribute data of all users.
The hidden Dirichlet distribution (LDA) model assumes that mobile phone travel trajectory data is generated in two steps of "user → subject" and "subject → travel": the method comprises the steps of firstly determining the proportion of travel subject distribution of each user, namely travel purposes such as entertainment and visiting, and then obtaining each time the user travels according to travel purpose sampling.
As shown in fig. 2, a probabilistic graph model of the travel purpose identification method of the present invention is based on the above assumption that the travel data of the user is generated according to the probabilistic graph model of fig. 2: in this model, the shaded circles represent observable variables, the unshaded circles represent hidden variables, the arrows represent conditional dependencies between two variables, the boxes represent oversampling, and the numbers in the lower right hand corner of the boxes represent the number of oversampling. For a discrete variable zij,dijAnd cijAt α → πi→zijProcedure as an example, topic zijDistribution of pi from a polynomialiMiddle sampling, and LDA introduces a Bayesian framework, considering piiAlso obtained by sampling from a probability distribution, thus introducing a conjugate prior distribution of a polynomial distribution, the Dirichlet distribution, i.e. πiObeying a dirichlet distribution with a hyper-parameter α. dijAnd cijThe same applies to the sampling process. For continuous variables, t is the function of the conjugate prior of the normal distribution, which is still normalijFrom mean value μzVariance is tau-1Normal distribution of
Figure BDA0002921710110000131
Medium sampling is performed, and the hyperparameter mu of the normal distributionzObedience over-parameter is η00Normal distribution of (1), variable sijThe sampling process is the same.
Step S70, defining the number of the user interest travel topics, solving the probability graph model by using a Gibbs sampling method, and obtaining the travel topics of the travel tracks in the interest travel track record.
Step S71, calculating the confusion of the probability map model under different subjects, and taking the number of subjects corresponding to the confusion being lower than a set threshold as the number of the user interest travel subjects;
the degree of confusion is represented by formula (2):
Figure BDA0002921710110000132
the Likelihoods are likelihood functions of all data in the interest trip track record, and N is the total number of trip tracks in the interest trip track record;
and step S72, iteratively distributing the probabilities of the subjects to which all users travel through a Gibbs sampling method until the probability graph model converges when the confusion does not decrease any more, and obtaining the travel subjects of the travel tracks in the interest travel track record.
After the travel attributes of all users are observed, parameters in the LDA model can be calculated through a maximum likelihood estimation method, and the travel purpose of each travel is further estimated. The likelihood function Likelihoods of all data in the interest travel track record is shown as formula (3):
Figure BDA0002921710110000141
wherein alpha, tau and lambda are hyper-parameters of a preset probability map model,π、μz
Figure BDA0002921710110000142
ηzAnd thetazFor the distribution of travel attributes, z represents the subject of travel, tij、dij、sijAnd cijRespectively representing the arrival time of the jth trip of the user i, the age group of the user, the residence time and the interest point type of the destination, P is a probability value, representing multiplication operation, M is the total number of the users, N is the total number of the users, andiand K is the total number of the travel tracks of the user i, and the number of the travel subjects of the user.
Based on the principle that the lower the confusion degree, the better the model performance, the number of topics corresponding to the lower confusion degree is designated as the total number of travel topics, and in one embodiment of the present invention, the confusion degree is the minimum when the total number of topics K is 18, so K is taken as 18.
Solving the LDA model by a Gibbs sampling method: for high dimensional data, Gibbs sampling selects one dimension sample while keeping the other dimensions unchanged, and then rotates the dimension iterations. It samples the topic of each word and once the topic of each word is determined, the frequency of occurrence of the word can be used to calculate the parameters. In practical application, only four travel attributes of arrival time, age group, residence time and destination POI type are observed, and the hidden variables z, pi, mu,
Figure BDA0002921710110000144
η, θ, and the last five hidden variables are related to the topic variable z, which can be represented by equation (4) -equation (8) by the desired formula of dirichlet distribution:
Figure BDA0002921710110000143
Figure BDA0002921710110000151
Figure BDA0002921710110000152
Figure BDA0002921710110000153
Figure BDA0002921710110000154
wherein u isizThe total number of trips with the trip purpose of z, n, in all trips of the user izTotal number of trips, t, of z for trip purposeszSum of arrival times, w, of all trips of z for trip purposeszkTotal number of trips, s, of user age group d in all trips of z for trip purposeszSum of residence times for all trips, v, of z for trip purposeszcThe total number of trips, α, of all trips with z for trip purposes is c (10 POI types are selected in this embodiment, so c is 1,2, …,10)z,τ,τ00d,λ,λ00cIs a prior distribution hyperparameter in the probabilistic model shown in fig. 2. Suppose that the jth trip attribute of the user i is tij=t,dij=d,sij=s,cijC, then z for this trip purposeijThe conditional probability of z is shown in equation (9):
Figure BDA0002921710110000155
wherein the superscript-ij represents the word w-ijRemove this calculation, z-ijRepresents division by w-ijSubject assignment results for all but one trips, αzThe dirichlet distribution hyperparameter corresponding to z for trip purposes,
Figure BDA0002921710110000156
representing the mean value of the input as t
Figure BDA0002921710110000157
The equation is
Figure BDA0002921710110000158
The probability density value of the gaussian function of (a), similarly,
Figure BDA0002921710110000161
representing the input as s time mean of
Figure BDA0002921710110000162
The equation is
Figure BDA0002921710110000163
Probability density value of gaussian function.
The Gibbs sampling procedure is as follows:
(1) randomly initializing travel purposes of all trips, namely randomly distributing the travel purposes for each trip;
(2) statistical counting variable nz,uiz,tz,wzk,sz,vzc,z∈[1,K];
(3) While (unconvergence of probability map model)
For each user i of the M users:
for each trip record j in the ith user:
and (3) taking out jth trip data of the user i: z ← zij,t←tij,d←dij,s←sij,c←cij,i←iij
Update count value of discrete variable: n isz=nz-1,uiz=uiz-1,wzd=wzd-1,vzc=vzc-1;
Updating the accumulated value of the continuous variable: t is tz=tz-t,sz=sz-s;
The conditional probability P (z) is calculated according to equation (9)i=k|·),k∈[1,K];
From conditional probability P (z)i=k|·),k∈[1,K]Z' update z, z for sample trip purposesij←z′;
Update count value of discrete variable: n isz′=nz′+1,uiz′=uiz′+1,wz′d=wz′d+1,vz′c=vz′c+1;
Updating the accumulated value of the continuous variable: t is tz′=tz′+t,sz′=sz′+s;
And taking the theme distributed to each trip as the final trip theme when the probability graph converges.
And step S80, determining the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
As shown in fig. 3(a) -3 (j), which are schematic diagrams of travel modes corresponding to 18 subjects after Gibbs sampling convergence in the travel purpose identification method according to an embodiment of the present invention, for each subject z, under an LDA assumption, an age group and a destination POI both obey a plurality of distributions, and an arrival time attribute and a residence time attribute both obey a normal distribution. Each column is a theme, the horizontal axis in the subgraph is a row attribute, and the vertical axis represents the occurrence probability or the probability density value of the row attribute. Taking the first sub-graph under the subject 1 of fig. 3(a) as an example, the density value of the probability of the occurrence of the arrival time attribute of about 12:00 under the subject is 0.3, and the density value of the probability of the occurrence of the arrival time attribute of about 20:00 under the subject is 0, which indicates that the user in the subject 1 has a high probability of traveling at 12 am, but cannot occur at 20 am.
In the figure 3(a), the subjects 1-2 represent the subjects of 'hospitalization', namely, the corresponding purpose of going out of the subject is hospitalization. First, the travel commute in topic 1 occurs after lunch (around 12 am), the probability of age group younger people is high and resides at the destination for about 7 hours, a typical travel destination POI type is "medical service", and these several attributes conform to the travel characteristics for hospitalization purposes. The travel in the theme 2 is similar to the distribution of the theme 1 at the arrival time, the residence time is short (about 4 hours), the age group is concentrated on the old user, and the POI type of the destination is still identified as the medical service. In the travel of subject 3 of fig. 3(b), the arrival time is approximately 8 am, and the proportion of young and old users in the age group is high, the destination POI type is office building, stays for 2 hours, and is recognized as business. The travel spatio-temporal features of topics 4 and 5 of fig. 3(c) are similar, both being a morning trip and the destination POIs being office buildings, with dwell times of 4 hours and 8 hours, respectively, and thus identified as work purpose.
Themes 6 and 7 of fig. 3(d) represent activities to go to a park or a scenic spot in the noon, since these trips arrive around noon, the middle-aged users are few and the users stay in the scenic spot for about 2 to 4 hours, and therefore it is reasonable to identify the trip purpose as playing. The trips in topics 8 and 9 of fig. 3(e) represent a walk-through at night, since trips on these topics occur in the evening, the probability of middle-aged people is high, POIs are scenic spots and the length of stay is short. The trip in topic 10 of fig. 3(f) represents a walking activity at noon, as it occurs at noon and the middle-aged user resides in the scenic spot for about 1 hour. Figure 3(f) a trip on subject 11 resides around 12 hours and the POI is a government agency or community, with a greater probability of multiple age group users appearing, which may be related to night shifts or visiting and overnight activities. The travel arrival time in the theme 12 of fig. 3(g) is at night and the residence time is short, the travel user is mainly middle-aged and has a greater possibility of visiting the shopping venue, and therefore the theme is shopping and shopping mall activities at night.
The trip in subject 13 of fig. 3(g) occurred at 8 am, with the younger age population having a high percentage of all age groups, and with a dwell time of 11 hours and a greater weight of POI type "educational institution", it can be determined that it corresponds to a school activity. Fig. 3(h) topics 14 and 15 arrive at times of approximately 16 and 14 pm, and are identified as entertainment activities in conjunction with POI type and short dwell time for "entertainment venues". In the figure 3(i), the time-space characteristics of subjects 16-17 are obvious, the travel arrival time is noon, the middle-aged users and the old users respectively occupy a large proportion, and the POI is catering service and dining activities at noon. Similarly, subject 18 of FIG. 3(j) is identified as a dinner party activity in the evening.
A travel purpose identification system of a second embodiment of the present invention includes the following modules:
the data acquisition module is configured to acquire travel track records in the mobile phone signaling data of the user; the travel track record comprises departure time, arrival time and longitude and latitude of a destination;
the preprocessing module is configured to perform resident duration acquisition and abnormal travel filtering processing on the travel track record and divide the travel track record into an upper learning stage, a working stage and a retirement stage according to age intervals;
the position identification module is configured to identify the living position and the office position of the user in the working stage according to heuristic rules based on the travel track record of the user in the working stage;
the first identification module is configured to determine two travel purposes of working and returning home of the user in the working stage based on the living position and the office position of the user in the working stage;
the interest point type confirming module is configured to take travel track records of users in working stages and all travel track records of users in school stages and retirement stages as interest travel track records, and determine interest point types according to destination longitude and latitude of each travel track in the interest travel track records by using an online map;
the trip representation module is configured to express the interest trip track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as trip attributes, and establish a probability graph model based on the hidden Dirichlet distribution to represent the generation process of the interest trip;
a trip theme obtaining module configured to define the number of user interest trip themes, solve the probability map model by using a Gibbs sampling method, and obtain trip themes of each trip track in the interest trip track record;
and the second identification module is configured to determine the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the travel purpose identification system provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiments of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An electronic apparatus according to a third embodiment of the present invention includes:
at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the above travel purpose identification method.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the above travel purpose identification method.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A travel purpose identification method is characterized by comprising the following steps:
step S10, obtaining travel track record in the user mobile phone signaling data; the travel track record comprises departure time, arrival time and longitude and latitude of a destination;
step S20, performing resident duration acquisition and abnormal travel filtering processing on the travel track record, and dividing the travel track record into an upper school stage, a working stage and a retirement stage according to age periods;
step S30, based on the travel track record of the user in the working stage, identifying the living position and the office position of the user in the working stage according to a heuristic rule;
step S40, determining two travel purposes of working and returning to the home of the user in the working stage based on the living position and the office position of the user in the working stage, and taking travel track records except the two travel purposes of working and returning to the home of the user in the working stage and all the travel track records of the user in the school stage and the retirement stage as interest travel track records;
step S50, determining the type of the interest point according to the destination longitude and latitude of each travel track in the interest travel track record by using an online map;
step S60, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and establishing a probability graph model based on the hidden Dirichlet distribution to express the generation process of the interest travel;
step S70, defining the number of user interest travel topics, solving the probability graph model by using a Gibbs sampling method, and obtaining travel topics of travel tracks in the interest travel track record;
and step S80, determining the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
2. A travel purpose identification method according to claim 1, wherein said stay duration is the time interval between every two trips of the user; the abnormal travel is a travel track record with the residence time longer than 15 hours.
3. An identification method for travel purposes according to claim 1, characterised in that the school stage is in the age range of 13-24 years, the working stage is in the age range of 25-59 years and the retirement stage is in the age range above 60 years.
4. A travel purpose identification method according to claim 1, wherein step S30 includes:
extracting the positions where the stay times at night within one month in the travel track record of the user in the working stage exceed the set times as the living positions of the user;
and extracting the position with the most visit times in the working time period within one month in the travel track record of the user in the working stage as the office position of the user.
5. A travel purpose identification method according to claim 1, wherein step S40 includes:
in the travel track record of the user in the working stage, the starting point is the travel of the user living position and the destination is the user office position, and the travel purpose is working;
in the travel track record of the user in the working stage, the destination is the travel of the resident position of the user, and the travel purpose is returning home;
in the travel track record of the user with the travel purpose of returning home, the starting point is the travel of the office position of the user, and the travel purpose is returning home from work.
6. A travel purpose identification method according to claim 1, wherein the point of interest is a land type of the point, and the land type includes a dining service, a health service, a residential area, a living service, an entertainment facility, a corporate enterprise, a shopping service, an educational institution, a scenic spot, a government institution, and an office building.
7. A travel purpose identification method according to claim 1, wherein in step S60, a probability map model based on the hidden dirichlet distribution is established to represent the generation process of the interest travel, and the method is as follows:
step S61, expressing the interest travel track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as travel attributes, and dividing the travel generation process into a user → subject step and a subject → travel step based on the distribution of the hidden Dirichlet;
step S62, determining the trip theme distribution of the user, and obtaining each trip of the user according to the trip purpose sampling to obtain a probability map model based on the hidden Dirichlet distribution;
the travel theme distribution is the proportion distribution of different types of travel purposes in the interest travel track record of each user.
8. A travel purpose identification method according to claim 1, wherein step S70 includes:
step S71, calculating the confusion of the probability map model under different subjects, and taking the number of subjects corresponding to the confusion being lower than a set threshold as the number of the user interest travel subjects;
the confusability Perplexity is:
Figure FDA0002921710100000031
the Likelihoods are likelihood functions of all data in the interest trip track record, and N is the total number of trip tracks in the interest trip track record;
and step S72, iteratively distributing the probabilities of the subjects to which all users travel through a Gibbs sampling method until the probability graph model converges when the confusion does not decrease any more, and obtaining the travel subjects of the travel tracks in the interest travel track record.
9. A travel purpose identification method according to claim 8, characterised in that the likelihood function Likelihoods of all data in the interest travel trajectory record is:
Figure FDA0002921710100000041
whereinAlpha, tau and lambda are the hyper-parameters of the predetermined probability map model, pi, muz
Figure FDA0002921710100000042
ηzAnd thetazFor the distribution of travel attributes, z represents the subject of travel, tij、dij、sijAnd cijRespectively representing the arrival time of the jth trip of the user i, the age group of the user, the residence time and the interest point type of the destination, P is a probability value, representing multiplication operation, M is the total number of the users, N is the total number of the users, andiand K is the total number of the travel tracks of the user i, and the number of the travel subjects of the user.
10. An identification system for travel purposes, characterized in that it comprises the following modules:
the data acquisition module is configured to acquire travel track records in the mobile phone signaling data of the user; the travel track record comprises departure time, arrival time and longitude and latitude of a destination;
the preprocessing module is configured to perform resident duration acquisition and abnormal travel filtering processing on the travel track record and divide the travel track record into an upper learning stage, a working stage and a retirement stage according to age intervals;
the position identification module is configured to identify the living position and the office position of the user in the working stage according to heuristic rules based on the travel track record of the user in the working stage;
the first identification module is configured to determine two travel purposes of working and returning home of the user in the working stage based on the living position and the office position of the user in the working stage;
the interest point type confirming module is configured to take travel track records of users in working stages and all travel track records of users in school stages and retirement stages as interest travel track records, and determine interest point types according to destination longitude and latitude of each travel track in the interest travel track records by using an online map;
the trip representation module is configured to express the interest trip track record by taking the arrival time, the age group to which the user belongs, the residence time and the interest point type of the destination as trip attributes, and establish a probability graph model based on the hidden Dirichlet distribution to represent the generation process of the interest trip;
a trip theme obtaining module configured to define the number of user interest trip themes, solve the probability map model by using a Gibbs sampling method, and obtain trip themes of each trip track in the interest trip track record;
and the second identification module is configured to determine the travel purpose of each travel track in the interest travel track record according to the travel characteristics under each theme by a visual interpretation method.
CN202110118774.4A 2021-01-28 2021-01-28 Travel purpose identification method and system Active CN112836121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110118774.4A CN112836121B (en) 2021-01-28 2021-01-28 Travel purpose identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110118774.4A CN112836121B (en) 2021-01-28 2021-01-28 Travel purpose identification method and system

Publications (2)

Publication Number Publication Date
CN112836121A true CN112836121A (en) 2021-05-25
CN112836121B CN112836121B (en) 2022-02-25

Family

ID=75932212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110118774.4A Active CN112836121B (en) 2021-01-28 2021-01-28 Travel purpose identification method and system

Country Status (1)

Country Link
CN (1) CN112836121B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626340A (en) * 2022-03-17 2022-06-14 智慧足迹数据科技有限公司 Behavior feature extraction method based on mobile phone signaling and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159583A (en) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 User behavior analysis method, device, equipment and storage medium
CN111382224A (en) * 2020-03-06 2020-07-07 厦门大学 Urban area function intelligent identification method based on multi-source data fusion
CN111724184A (en) * 2019-03-20 2020-09-29 北京嘀嘀无限科技发展有限公司 Transformation probability prediction method and device
CN112215666A (en) * 2020-11-03 2021-01-12 广州市交通规划研究院 Characteristic identification method for different trip activities based on mobile phone positioning data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724184A (en) * 2019-03-20 2020-09-29 北京嘀嘀无限科技发展有限公司 Transformation probability prediction method and device
CN111159583A (en) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 User behavior analysis method, device, equipment and storage medium
CN111382224A (en) * 2020-03-06 2020-07-07 厦门大学 Urban area function intelligent identification method based on multi-source data fusion
CN112215666A (en) * 2020-11-03 2021-01-12 广州市交通规划研究院 Characteristic identification method for different trip activities based on mobile phone positioning data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626340A (en) * 2022-03-17 2022-06-14 智慧足迹数据科技有限公司 Behavior feature extraction method based on mobile phone signaling and related device
CN114626340B (en) * 2022-03-17 2023-02-03 智慧足迹数据科技有限公司 Behavior feature extraction method based on mobile phone signaling and related device

Also Published As

Publication number Publication date
CN112836121B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
Obilor Convenience and purposive sampling techniques: Are they the same
Xiao et al. Detecting trip purposes from smartphone-based travel surveys with artificial neural networks and particle swarm optimization
Zhong et al. Inferring building functions from a probabilistic model using public transportation data
Shoval et al. The application of a sequence alignment method to the creation of typologies of tourist activity in time and space
de Regt et al. Public transportation in Great Britain viewed as a complex network
Gong et al. Extracting activity patterns from taxi trajectory data: A two-layer framework using spatio-temporal clustering, Bayesian probability and Monte Carlo simulation
CN105893537B (en) The determination method and apparatus of geography information point
Palm-Forster et al. Valuing Lake Erie beaches using value and function transfers
Gong et al. Comparison of three rapid household survey sampling methods for vaccination coverage assessment in a peri-urban setting in Pakistan
Danalet Activity choice modeling for pedestrian facilities
Jia et al. Measuring the vibrancy of urban neighborhoods using mobile phone data with an improved PageRank algorithm
Coutrot et al. Cities have a negative impact on navigation ability: evidence from 38 countries
Stavroulaki et al. Statistical modelling and analysis of big data on pedestrian movement
CN112836121B (en) Travel purpose identification method and system
Lewis et al. Using mobile technology to track wine tourists
Bermingham et al. Mining place-matching patterns from spatio-temporal trajectories using complex real-world places
Olszewski et al. Application of the spatial data mining methodology and gamification for the optimisation of solving the transport issues of the “Varsovian Mordor”
Mousavi et al. A new ontology-based approach for human activity recognition from gps data
Cawley et al. Irish migration and return: continuities and changes over time
Zhang An approach to localness assessment of social media users
Aslam et al. Smart card data and human mobility
Kurowska et al. The use of gravity model in spatial planning
Brehme et al. Landscape values mapping for tranquillity in North York Moors National Park and Howardian Hills AONB
Cooper Using geotagged Twitter data to uncover hidden church populations
Mann et al. East, west and the bit in the middleLocalities in north Wales

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant