WO2014187476A1

WO2014187476A1 - Method and system for predicting mobility demand of users

Info

Publication number: WO2014187476A1
Application number: PCT/EP2013/060433
Authority: WO
Inventors: Konstantinos GKIOTSALITIS; Francesco ALESIANI; Roberto Baldessari
Original assignee: Nec Europe Ltd.
Priority date: 2013-05-21
Filing date: 2013-05-21
Publication date: 2014-11-27

Abstract

The present invention relates to a method for predicting mobility demand of a user, comprising the steps of a) Determining general user information including user location information and user time information from provided user data of a plurality of users, b) Determining detailed user information by analyzing user interaction of the users from provided user data, c) Analyzing the determined general user information and the detailed user information, d) Determining user state probabilities and individual user transition state probabilities based on the analyzed user information, e) Assigning users of the plurality of users into one of homogenous groups based on determined user states and user transition state probabilities, wherein a homogenous group comprises representatives representing general user criteria, f) Predicting the mobility demand of a user based on a representative user of one or more homogenous groups the user being assigned to and based on the user state probability and individual user transition state probability of the representative user. The present invention relates further to a system for predicting mobility demand of a user.

Description

METHOD AND SYSTEM FOR PREDICTING MOBILITY DEMAND OF

USERS

The present invention relates to a method for predicting mobility demand of a user.

The preset invention further relates to a system for predicting mobility demand of a user. Mobility demand prediction is in particular becoming interesting for urban transportation planning and optimization of public transport services. The mobility demand prediction enables for example forecasting of future traffic flows as well as public transport usage. It can be therefore be used to facilitate short term or long term decision making, for example if new infrastructure should be provisioned, expanded or the like. Also new public transport services may be launched based on the mobility demand prediction.

In the non patent literature of Arentze, T.A., Timmermans H.J. P., "A learning- based transportation oriented simulation system", in Transportation Research Part B, 38, pp. 613-633, (2004) an activity-based model for mobility demand prediction is described attempting to understand and imitate user respectively traveler decisions by assuming that user trips are means to participate in activities and are not undertaken without scope due to their inherent disutility. Activity-based models in contrast to trip-based models enable a modeling of soft policies, like implementation of congestion charging schemes, working hours rescheduling or the like and an estimation of these soft policies on the mobility demand prediction.

One of the drawbacks is however that both activity-based models as well as trip- based models use census data for identifying mobility network characteristics, demographics and land use. This means that they are bound to data acquisition through expensive and time consuming household travel surveys, which are filled by a sample of the population within a studied area. Since household travel surveys are expensive, more and more cities with budget limitations borrow surveys from other cities and develop their prediction models upon the data provided by the other cities. However, this data is unreliable, since characteristics of the other cities may be significantly different from the city borrowing the data. Furthermore data updates for the corresponding surveys are sparsely conducted since large-scale, survey-based data acquisition is also expensive and thus the accuracy of the activity-based model is reduced over time. A further disadvantage is, that survey-based methods for mobility demand prediction are not interoperable since they are bound to corresponding regional survey data.

In the further non patent literature of Calabrese, F., Di Lorenzo, G., Liu, L., Ratti, C, "Estimating Origin-Destination flows using opportunistically collected mobile phone location data from one million users in Boston Metropolitan Area" in: IEEE Pervasive Computing, vol. 10 no. 4, pp. 36-44, (201 1 ), and in US 2012/0191505 A1 mobile phone data is used instead of travel surveys. In the non patent literature of Zhou, X., and Mahmassani, H.S., "Dynamic origin- destination demand estimation using automatic vehicle identification data", in: IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 1 , pp. 105- 1 14, (2006), data from inductive loop detectors or from automatic vehicle detection is used.

However one of the drawbacks is, that data from continuous sensing mechanisms like inductive loop detectors is not able to describe a mobility behavior of travelers with the same accuracy as activity-based models or trip-based models, since models based on data from sensing mechanisms can only determine location and time to infer activities of travelers but are not able to consider an influencing of travelers mobility behavior from other parameters.

It is therefore an objective of the present invention to provide a method and a system for predicting mobility demand of a user or users which are inexpensive and enable a continuous mobility demand prediction.

It is a further objective of the present invention to provide a method and a system for predicting mobility demand of a user being interoperable, in particular is not bound to a particular city or region. It is an even further objective of the present invention to provide a more detailed understanding of a users mobility behavior, i.e. provides a more precised mobility demand prediction.

According to the invention the aforementioned objectives are accomplished by a method of claim 1 and a system of claim 22.

In claim 1 a method for predicting mobility demand of a user is defined.

According to claim 1 the method is characterized by the steps of a) Determining general user information including user location information and user time information from provided user data of a plurality of users,

b) Determining detailed user information by analyzing user interaction of the users from provided user data,

c) Analyzing the determined general user information and the detailed user information,

d) Determining user state probabilities and individual user transition state probabilities based on the analyzed user information,

e) Assigning users of the plurality of users into one of homogenous groups based on determined user states and user transition state probabilities, wherein a homogenous group comprises representatives representing general user criteria,

f) Predicting the mobility demand of a user based on a representative user of one or more homogenous groups the user being assigned to and based on the user state probability and individual user transition state probability of the representative user,

In claim 22 a system for predicting mobility demand of a user is defined.

According to claim 22 the system is characterized by general user means operable to determine general user information including user location information and user time information from provided user data of a plurality of users,

detailed user means operable to determine detailed user information by analyzing user interaction of the users from provided user data,

analyzing means operable to analyze the determined general user information and the detailed user information,

probability means operable to determine user state probabilities and individual user transition state probabilities based on the analyzed user information,

assigning means operable to assign users of the plurality of users into one of homogenous groups based on determined user states and user transition state probabilities, wherein a homogenous group comprises representatives representing general user criteria, and

predicting means operable to predict the mobility demand of the user based on a representative user of one or more of the homogenous groups the user being assigned to and the user state probability and individual user transition state probability of the representative user.

According to the invention it has been recognized that by using general user information and detailed user information respectively analyzing user interaction enables a more precise mobility demand prediction of users. According to the invention it has been further recognized that a detailed understanding of users mobility behavior is enabled and thus a more precise mobility demand prediction.

According to the invention it has been further recognized that a development of user profiles is enabled, thus enhancing the flexibility of the present invention. For example it can also be used for further topics like non-transport related topics or the like. According to the invention it has been further recognized that a continuous mobility demand prediction of a user is enabled.

According to the invention it has been further recognized that the method and the system are inexpensive, since in particular continuous public data from social media is used to perform mobility demand prediction.

In other words the present invention enables in particular: A correlation of location and activity as well as social media interaction for mobility demand prediction and relates them to travel decisions of a user. A probabilistic model for combining detailed user information respectively user interaction with general user information for example from other sensors is used. Individuals or users are assigned to certain homogenous groups in particular with a probabilistic similarity- based model. A data model is adopted including general information such as location information, for example most revisited locations of a user and temporal data, for example duration of staying in a location of a user, day type or the like together with more detailed information about the users behavior. The detailed user information is derived from the interaction of the user with respective sensing mechanisms. By analyzing of the respective general user information and detailed user information the state of the user can be determined as well as the user's probability to switch or transfer from one user state to another user state over time. The corresponding probabilities of being in a particular state at a specific time instant to transfer/switch from that state into another state can be determined for a user. The users are then classified into homogeneous groups based on representatives in that groups representing general user criteria. Then general characteristics of users of the homogeneous groups are used to connect the users with a user for which mobility demand is to be predicted and whose characteristics may be derived from census data. This enables assigning users for which mobility demand is to be predicted to homogeneous groups, infer their state probabilities and transition state probabilities and predict the mobility demand for these users.

Further features, advantages and preferred embodiments are described in the following subclaims. According to a preferred embodiment interaction of the user is determined by analyzing user related social media information, user related information of city- released apps, user road traffic information and/or user network traces. By using social media information and combining the analyzed social media information with the general user information a more precise mobility demand prediction is enabled. Since social networks and/or city-released apps are expected to steadily grow in the next years the accuracy of the mobility demand prediction is further enhanced. User road traffic information are for example data from inductive loop detectors from automatic vehicle detection floating car data or the like. User network traces are for example mobile phone data or the like.

According to a further preferred embodiment general user characteristics are derived from census data and wherein this general user characteristics are used to relate the user for which mobility demand is to be predicted to a representative user of a homogenous group. This enables for example to use the general characteristics like sex, age, employment or the like of users who belong to homogenous groups to connect them with users for which mobility demand is to be predicted and whose characteristics are derived from the census data in an easy way, preferably infer their users state probabilities and user transition state probabilities and predict the mobility demand of these users.

According to a further preferred embodiment in case of missing general user information and/or detailed user information user state probabilities and individual user transition state probabilities are estimated. Such an estimation of state probabilities may be performed by using educated guessing rules in order to derive a user state and user transition probabilities. Thus a precise predication of mobility demand even in case of absent or missing data is provided.

According to a further preferred embodiment a user state is determined based on user location, user activity type, distance from the user's home, user interaction type, user emotion, topic of user interaction, time of user interaction and/or day type. By determining the user's state based on the classes user location, etc. a sufficient description of a state of a user is enabled providing a precise prediction of the mobility demand of the user. Of course the user state may be determined based on one or a plurality of the classes listed above.

According to a further preferred embodiment the user transition state probability is based on the number of times switching from a first user state into a second user state at a certain time and a certain day type and the number of times switching from the first user state to any other user state. This enables in an easy way to define the user's transition state probability. The user state transition probability represents then a moving from a current state to a new state.

According to a further preferred embodiment user state changes are limited to discrete times. This reduces calculation time for the user state and user transition state probabilities while still providing a sufficient position for predicting the mobility demand of a user. Of course user states may change continuously at any time, the state probability and transition probabilities may not have an explicit time dependency and user states may evolve on both continuous and discrete time, over predetermined time period, in particular the time period being one day.

According to a further preferred embodiment users from the plurality of users lying within a certain distance of a predetermined point are assigned to the same homogeneous group. This enables a fast and easy assigning of users to the same homogeneous group.

According to a further preferred embodiment duration and/or data exchange of the social media interaction of the user are determined for determining the user interaction. This enables an easy determination of a user interaction not only in terms of type of user interaction but also with regard to corresponding (day) times.

According to a further preferred embodiment for determining the user interaction the user interaction is classified into at least four interaction categories including photo upload, continuous interaction, chatting and reply. The category photo upload is the action when a user uploads a photo, the category continuous interaction is the action when the user shares information without having a conversation with another user for example several times per hour and at least more than one, the category chatting is the action when the user exchanges information with another user, for example more than once during one hour and reply is the action when the user sends a single message to a specific recipient over one hour. Of course it is not limited to one or per hour, i.e. every time period can be used for connecting the action with a corresponding time point. Using the at least four interaction categories enables a precise determination of the user interaction with regard to social media interactions.

According to a further preferred embodiment content of messages and/or of interactions of the user, preferably by counting positive emotion keywords against negative emotion keywords in the content, is analyzed for determining the user emotion. The term "keyword" is not limited to words, but includes also symbols like emoticons or the like. This enables a precise determination respectively inferring of the mood of the user. In particular positive emotion indicating a positive mood is inferred if the sum of positive keywords and emoticons in a composed message is bigger than the sum of negative keywords and emoticons in the same message. A negative emotion denotes for example that the sum of negative keywords and emoticons in a composed message is bigger then the sum of negative keywords and emoticons. The emotion or mood of the user can also be neutral if there are no positive and negative keywords or their sums are equal.

According to a further preferred embodiment the context of a user message and/or of the user interaction is analyzed for determining the topic of user interaction, preferably by searching for specific keywords associated with predefined topics. For example a set of predetermined keywords are set to identify if the topic of the message is related for example to sports, food, fashion, news, upcoming events, shopping, reading or the like. This provides the precise while fast determination of the topic of a user interaction. According to a further preferred embodiment a user message is analyzed with the regard to publicity/privacy of the user message and/or of the user interaction for determining the topic of a user interaction, preferably by analyzing the source of the user message and/or of the user interaction. This enables an even more precise determination of the topic of a user not only classifying for example with regard to certain keywords but also if the message is addressed to a plurality of users or posted on a discussion board, on facebook or the like and/or is dedicated for everyone. According to a further preferred embodiment for determining the activities of a user, activities are associated with predetermined and representative locations of the user, wherein for each representative location one activity is assigned to. This avoids an overcounting of similar data and therefore (statistical) noise. According to a further preferred embodiment the representative locations include at least four different types of representative locations including home location, work location, location of a fixed activity and location of a flexible activity. A flexible activity may be in particular also a non-recurrent activity. For example if accumulated data from a first representative location is more than 5% compared to the total amount of the accumulated data, then the representative location is set as a location of a fixed activity. For fixed activity locations the following statistics may be calculated: The earliest time, Min_H, at which the user was at the fixed activity location, the latest time Max_H at which the user was at the fixed activity location, the median, the 15% percentile Q15, the 25% percentile, Q25, the 75% percentile Q75 and the 85% percentile Qss. To assign the different activity types to the representative locations the following rule-set may be used:

FF, flexible/non-recurrent activity location: a representative location at which the accumulated data is less than 5% compared to the total amount of data.

CF, fixed/recurrent activity Location: a representative location at which the accumulated data is more than 5% compared to the total amount of data. W, work activity location: a representative location which satisfies a set of criteria:

o the user exchanges information from there at daytime during a wide time range (Q75-Q25≥ 4 hours)

o the sum of chatting and continuous interacting from this representative location, X:∑_x Ci+∑_X C₂ , is more than 75% compared to the maximum sum ∑_γ Ο + ∑_Y C₂ from another representative location, Y, or the total number of interactions is more than 75% compared to the representative location with the most interactions o the sum of user messages from this representative location, X, which infer a positive mood, PE, is smaller or equal to the sum of messages from the same location which infer a negative mood, NE.

- H, Home activity location: a representative location, X, from which the number of interactions during weekdays and during the time period {Min_T; Min_T+2houts}u{Max_T-3hours; Max_T} is bigger compared to the number of interactions from any other representative location during the same time period. Min_T and Max_T are values which can be derived from the temporal analysis of social media information and denote the earliest and latest time at which the user interacted with social media. According to a further preferred embodiment for determining a distance between two representative locations the haversine-function is used. This provides fast and easy determination of the distance between two representative locations.

According to a further preferred embodiment for estimating the user state probabilities and individual user transition state probabilities, a user switching time being the travel time of a user between two different locations is determined based on the user activity type and/or the time of user interaction at the start location and/or the end location. The switching time is thus in particular the time when the user travels from for example his home to another representative location and this time depends on the time of activity at the destination. Therefore for example the switching time may be estimated as follows: o tswitch = minimum{ Q15, ti} , if the representative location is the location of a fixed/recurrent activity CF location or work activity location W

o tswitch = ti if the representative location is the location of a flexible/nonrecurrent activity location FF

- The location and activity values of the user's state are changed only if he is tracked at a different representative location than the one he was before. - If the user is traced at a location Y at time t2 and before he was at location X*Y at time ti, then it may be assumed that the user stayed at location X till time t=tswitch and changed locations at time t_SWitch. The switching time depends on the interaction times ti, t2 and the type of the activities at locations X, Y as explained below:

o Switching from FF to CF or W: t_SWitch = maximum{Qi5, ti}

o Switching from FF to another FF or H: t_SWitch = (ti+ t2) / 2

o Switching from H to FF: t_SWitch = (ti+ t2) / 2

o Switching from H to CF or W: t_SWitch = maximum{Qi5, ti}

o Switching from CF or W to FF or H: t_SWitch = minimum{Q85, te} o Switching from CF or W to another CF or W: t_SWitch = (ti+ t2) / 2

- In the case where for example the last daily interaction was at time t from a different representative location than the one which is associated with home, it may be assumed that the user returned to home at switching time: o tswitch = minimum{t+2hours, Max_T}, if the last interaction is from a FF location

tswitch = maximum{Q85, t} if the last interaction is from a CF location.

According to a further preferred embodiment assigning a user into a homogenous group is based on a probability distance between the user state probabilities of different users, preferably wherein the probability distance includes a time weighing parameter for the probability distance at different times. This allows users whose general user information and detailed user information is "closer" to each other according to the probability distance to be grouped more likely into the same homogenous group. For example assigning users into a different homogenous groups is performed by using the user state probability of the respective user:

where k is the user, i is the state, t is the time and w is the model class and w_w e {0,l] is the weight of the class, wherein class represents user location, user activity type, distance from a users home, user interaction type, user emotion, topic of user interaction, time of user interaction or day type.

According to a further preferred embodiment the probability distance is determined according to the Kullback-Leibler divergence, a minimum-shifted Kullback-Leibler divergence and/or a geometric distance. This allows a reliable determination of the probability distance.

Distances can be weighted with weights defined externally and if _x and P₂ are two probabilities their distance is:

T

d₁₂ = d(P₁, P₂) = w_td(P₁(t), P₂ (t))

t=i

, where T is the time sequence, w_t e {0,1} is a weight that depends on the time index, and P (t), 2 O ^are the probabilities at a certain time index.

The min-shift distance is defined as:

T

d₁' ₂ = min y w_td{P₁ {t), P₂ {t + T))

t=l

, where H is the max distance in time and could be set to 2. The probabilistic user state model distance based on the Kullback-Leibler diverge is defined as

N

dfi(t) = d(P_!(t), P₂ ( ) ₌ V _w.z_{n Pl}(_{if t})

—f 2 ^l> t)

1=1 , where W_j e {0,1} is another weight which is set for the specific state geometric distance is defined as:

d⁹ {t) = d{P₁{t), P₂{t)) =

According to a further preferred embodiment for determining the representatives for the homogenous group a clustering procedure is performed on the determined probability distances. This provides a reliable determination of the representatives for the homogenous group and further of the mobility demand prediction of a user. According to a further preferred embodiment the further step g) predicting the mobility demand within a study area by aggregating the estimated mobility patterns of users within the study area is performed. This enables a precise prediction of mobility demand for a plurality of users. There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the patent claims subordinate to patent claim 1 on the one hand and to the following explanation of preferred embodiments of the invention by way of example, illustrated by the figure on the other hand. In connection with the explanation of the preferred embodiments of the invention by the aid of the figure, generally preferred embodiments and further developments of the teaching will be explained.

In the drawings Fig. 1 shows a flowchart for a method according to a first embodiment of the present invention;

Fig. 2 shows a flowchart of part of a method according to a second embodiment of the present invention; Fig. 3 shows diagrams and steps of part of a method according to a third embodiment of the present invention and

Fig. 4 shows a flowchart of a method according to a fourth embodiment of the present invention.

Fig. 1 shows a flowchart for a method according to a first embodiment of the present invention.

In Fig. 1 a high level flowchart of a method according to a first embodiment of the present invention is shown. In a first step S1 data from social media sensing mechanisms is determined and in a second step S2 data from mobile networks, road sensors or the like are determined. In a third step S3 the determined data is analyzed by using interaction, location and/or temporal classification of the determined data. In case of missing data for determining individual probabilistic state models for users in a fourth step S4, the user states are estimated in a further step S3'. In the probabilistic state model user state probabilities and user state transition probabilities are described based on the data analysis according to step S3. From the individual probabilistic state models a group model in a further S4' is determined, i.e. users are classified into homogeneous groups based on a set of criteria according to the analyzed data of step S3. In a fifth step S5 census data used for deriving characteristics of users for which mobility demand prediction has to be performed is used to connect them to the one or more representative users in the corresponding homogeneous groups.

In a sixth step S6 mobility demand prediction preferably covering a study area is performed based on the assigned users for whom mobility demand prediction has to be performed, the assigned homogeneous groups, and their individual probabilistic state models.

In other words and with regard to Fig. 1 : Metrics/indicators may be provided estimating a current state of an individual user and his willingness to change that user state. The user state describes his current condition and might include his emotion, his location, his undertaking activity and more. A transfer from one user state to another can be performed while being in the same place (i.e., mood change) or through travelling (i.e., location change). Metrics/indicators for describing the user's state are established by analyzing data from the user's interactions with sensing mechanisms.

Sensing mechanisms include popular social networks (Facebook, Twitter, Linkedln, MeetUp, Google+, Google Latitude, mySpace, Pinterest and more), city- released Apps (Chromaroma, Shared Transport Apps, Zip Card and other purpose-built Apps) as well as more traditional systems like cellular network traces and direct road traffic measurement via sensors. A data model is created based on general information such as location data (e.g. most revisited locations) and temporal data (e.g. duration of staying in a location, day type, etc.) together with more detailed information about the user's behavior. The detailed data is derived through the interaction of the user with sensing mechanisms (i.e., context of a message during an interaction through social media, topic of the message, type of interaction).

First, the metrics/indicators are established through data analysis which consists of a set of rules. Based on these rules, the state of the user can be described as well as the probability to transfer from one state to another state over time. In addition, since the state of the user evolves continuously and data provision from sensing mechanisms is not continuous, the states of the user may be estimated in case of missing data in order to derive the state and transition probabilities. The probabilities of being in a particular state at a specific time instant and to transfer from that state to another are unique for each user and are described through the user's probabilistic state model.

After that, users are classified into homogenous groups based on a set of criteria which are related to the above mentioned rules (i.e., their activity profiles, their probabilistic state models, other mobility indicators). Therefore, each group contains users with similar indicators.

Finally, the general characteristics (i.e., sex, age, employment) of users are used who belong to homogenous groups to connect them with the population within the study area whose characteristics are derived from census data. This step allows assigning users from the population to homogenous groups, infer their probabilistic state models, and predict the mobility demand within the study area. Fig. 2 shows a flowchart of part of a method according to a second embodiment of the present invention.

In Fig. 2 steps for construction or generation of an individual probabilistic state model of a user are described.

In a first step T1 data from social media, floating car data or data of mobile phones are input. In a second step T2 this input data is processed and an interaction analysis, a location analysis and a temporal analysis as well as further other optional analysis may be performed. In a third step T3 a probabilistic state model of a k-th user A^k is determined, wherein A is the type of activity and π is a state probability being in the state.

In detail, user states, together with user state transition probabilities from one state to another describe users. After analyzing the user's data, the set of the user's possible user states, qi G {Si,...,Sn}, where n is the number of states, are defined. In addition, each user has a unique PSM, λ = (A, 7l) , which is described by the state and transition probabilities:

Aijt = P{ctt+i,w ⁼ $i

⁼ represents the user state transition probability to move to state S, given that the user is at state Sj at time t and day type w

Tl i_t = P{c[_{t w} = Si) represents user state probability of being in state Si at time t and day type w The user state and transition probabilities are defined from a set of parameters derived from the interaction, location and temporal analysis of data. In a preferred embodiment, the set of parameters describing a user state are:

1 ) L: the location,

2) A: the activity type (Home Activity, Work Activity, Fixed Activity, Flexible Activity),

3) D: distance class from Home,

4) I: the type of interaction (chatting, continuous interaction, reply, photo upload),

5) E: the emotion (positive mood, negative mood),

6) T: the topic of the interaction (sports, food, fashion, news, upcoming events, shopping, reading),

7) t: the time of interaction,

8) w: the day type (weekday or weekend)

The user state probability of the user V, is defined as:

P_{v t} {L_z, A_z, O_z, l_z, E_z, T_z, t, w) = (Number of times being at a location L_z, undertaking an activity A_z, being D_z distance class based on the distance away from home, having an interaction type l_z, being in an emotional condition E_z, and exchange information about a topic T_z, at time t and day type w) / (Total number of times being on that state or on a different state at time t for day type w).

The user state transition probability of moving from the current state, q={L_z, A_z, D_z, I_z, E_z, T_z, t, w} to a new state q '={L'_Z, A_z' , D_z, l_z' , E_z', T_z, t, w} is:

P_v t (_. q—q'^ = (Number of times switching from state q to q' at time t and day type w) / (Number of times of switching from state q to q' or any other possible state).

Although states can change continuously at time t, state changes can also be limited to discrete time approximations (i.e., 15 min., 30 min., 1 hour). Those probabilities do not have an explicit time dependency and states can evolve on both continuous and discrete time, over a predetermined time period, where typically that time period may be one day. In addition, a user state state might include parameters without values, e.g. data from mobile data contains only location, time and day type data and the values of other parameters are void and require estimation.

The output of the probabilistic state model emulates the probabilistic outcome of a user's choice and is a sequence of states Q={qi,q2,q3, qr}, starting from the first daily state, qi.

The step T2 of data processing includes data analysis of the user's data over time, in particular estimating that users daily states for all available days and aggregating the results. In the following so-called "classes" are applied on the user data providing assigning the user to a corresponding homogeneous group. In the following possible classes are described:

Location Class -Location Parameter

The location class converts the location data which includes the coordinates of locations (coordinates included directly in case of GPS or indirectly through the GSM Cell ID) into elaborated data. Firstly, groups are created defined by a representative location in order to reduce positioning errors and the total number of locations. After that, all locations which lay within a circle with the representative location as center and 0.3 km as radius are assigned to the same group.

Interaction Class - Interaction Parameter

The interaction class investigates how the user interacts with the sensing mechanisms (e.g. social media) by examining the duration and data exchange characteristics of his/her interactions. In a preferred embodiment, four interaction categories are used: P, photo upload is the action when the user uploads a photo. Ci, continuous interaction is the action when the user shares information without having a conversation with another user several times per hour, and at least more than once. C2, chatting is the action when the user exchange information with another user more than once during one hour. R, reply is the action when the user sends a single message to a specific recipient over one hour.

Emotion Class -Emotion Parameter

The emotion class analyses the content of a message/interaction to infer the mood of the user. PE, positive emotion (i.e., a positive mood) is inferred if the sum of positive keywords and emoticons in a composed message is bigger than the sum of negative keywords and emoticons in the same message. NE, negative emotion denotes that the sum of negative keywords and emoticons in a composed message is bigger than the sum of negative keywords and emoticons. The emotion can also be neutral if there are no positive and negative keywords or their sums are equal.

Topic Class-Topic Parameter

The topic class analyses a context of a message/interaction by searching for specific keywords which are associated with specific topics. A set of keywords are set to identify if the topic of the message is related to sports, food, fashion, news, upcoming events, shopping, reading or other. One class can also classify the private or public messages. This is the case when public or private (unicast, multicast, broadcast messages) sources are used.

Activity Class -Activity Parameter

Activities are associated with representative locations and at each representative location only one activity can be undertaken. During the procedure of associating activities with representative locations, an over-counting of similar data is avoided because it introduces noise to the analysis without adding value. Therefore, all sensing detections during the same day, the same year, the same hour (i.e., 1 1 :25: 17 and 1 1 :58:12) and from the same representative location are merged into one.

Then, for each representative location is assigned an activity type: 1 ) H: Home location, 2) W: Work location, 3) CF: Location of a fixed activity, 4) FF: Location of flexible, non-recurrent activity. If the accumulated data from a representative location X, is more than 5% compared to the total amount of data, then the location is set as location of a fixed activity. For fixed activity locations the following statistics are calculated: a. the earliest time, Min_H, at which the user was at location X, b. the latest time, Max_H, at which the user was at location X, the median, the 15% percentile, Q15, the 25% percentile, Q25, the 75% percentile, Q75, and the 85% percentile, Qss.

To assign activity types to representative locations the following rule-set is used:

- FF, Flexible/non-recurrent activity Location: a representative location at which the accumulated data is less than 5% compared to the total amount of data.

- CF, Fixed/ recurrent activity Location: a representative location at which the accumulated data is more than 5% compared to the total amount of data.

- W, Work activity Location: a representative location which satisfies a set of criteria:

o the sum of chatting and continuous interacting from this representative location, X: ∑ £ι+ ∑ C₂, is more than 75% compared to the maximum sum ∑y Ci+ ∑y C₂ from another representative location, Y, or the total number of interactions is more than 75% compared to the representative location with the most interactions

o the sum of messages from this representative location, X, which infer a positive mood, PE, is smaller or equal to the sum of messages from the same location which infer a negative mood, NE.

- H, Home activity location: a representative location, X, from which the number of interactions during weekdays and during the time period {Min_T; Min_T+2houts}u{Max_T-3hours; Max_T} is bigger compared to the number of interactions from any other representative location during the same time period. Min_T and Max_T are values which can be derived from the temporal analysis of social media data and denote the earliest and latest time at which the user interacted with social media. Distance Class -Distance Parameter

Having defined the representative location of the home activity the distance of each user state's representative location from the home location is calculated. For that purpose, the haversine formula can be used for calculating the distance between two representative locations.

Fig. 3 shows diagrams and steps of a method according to a third embodiment of the present invention. In Fig. 3 the daily state evolution is estimated and the results are exploited and shown.

In a first step U1 the daily state evolution is estimated for each day included in the available data set. In a second step U2 the results for all days are aggregated and in a third step U3 the state and transition probabilities are computed.

In the upper left of Fig. 3 a diagram shows the distance of a user from home during a random date. For example at time index 1 and time index 7,5 the user is located close to his home. On time index 10,75 and 13 the user is more far away from home and on time index 18,25 and 20,25 the user is even more far away from his home. This data is then processed according to the previously described location and/or activity classes. This is shown in the upper right of Fig. 3. The corresponding distances are associated with corresponding activities. For example at time index 1 and time index 7,5 the user is located at his home which is indicated according to the close distance of the user from his home. Between the time index 8 and the time index 16 it is assumed that the user is at work therefore at time index 10,75 and at time index 13 where interactions have appeared the user is assumed to be at work W. At time indexes 18,25 and 20,25 the user is assumed to be not at home and to have a flexible activity FF.

Based on the random day activities after processing the user state evolutions are estimated by performing steps U2 and U3 resulting in the diagram on the lower right of Fig. 3. The values of location, distance from home and activity parameters in case of missing data while setting the values of other parameters as void may be estimated in the following way: - The user stays at home during the time period {Max_T ; Min_T}; therefore he is at the representative location which is associated with the home activity.

- When the user traced on a different location than the representative location associated with the home activity for the first time during the day (at time, ti) the user is at another representative location and participates in a fixed activity (CF or W) or a flexible activity (FF) at time ti. The switching time, tswitch, when the user travels from H to another representative location depends on the type of activity at the destination and is estimated as follows:

o tswitch = minimum{ Q15, ti} , if the representative location is the location of a CF or W

o tswitch = ti if the representative location is the location of a FF

- The location and activity values of the user's state are changed only if he is tracked at a different representative location than the one he was before. - If the user is traced at a location Y at time t2 and before he was at location

X*Y at time ti, then it is assumed that the user stayed at location X till time t=tswitch and changed locations at time tswitch. The switching time depends on the interaction times ti, and the type of the activities at locations X, Y as explained below:

o Switching from FF to CF or W: tswitch = maximum{Qi5, ti}

o Switching from FF to another FF or H: tswitch = (ti+ te) / 2

o Switching from H to FF: tswitch = (ti+ t2) / 2

o Switching from H to CF or W: t_SWitch = maximum{Qi5, ti}

o Switching from CF or W to FF or H: t_SWitch = minimum{Qs5, te} o Switching from CF or W to another CF or W: tswitch = (ti+ t2) / 2

- In the case where the last daily interaction was at time t from a different representative location than the one which is associated with home, we assume that the user returned to home at switching time: o tswitch = minimum{t+2hours, Max_T}, if the last interaction is from a FF location

o tswitch = maximum{Q85, t} if the last interaction is from a CF or W location

The procedure is continued for all available days and data from the estimated daily state evolutions is used to derive the user state and user state transition probabilities by aggregating the results from all days as follows:

P_{v t} (L_z, A_z, D_z, I_z, E_z, T_z, t, w)= (Number of times being at a location L_z, undertaking an activity A_z, being D_z distance class away from home, having an interaction type l_z, being in an emotional condition E_z, and exchange information about a topic T_z, at time t and day type w) / (Total number of times being on that state or on a different state at time t for day type w) The transition probability is computed by counting occurrence of the specific transition between states over the total number of observations:

P_v t (_. q—q'^ = (Number of times switching from state q to q' at time t and day type w) / (Number of times of switching from state <7 to g' or any other possible state).

After determining user state probabilities and individual user transition state probabilities based on the analyzed user information, i.e. determining an individual probabilistic state model the results of the data processing in step S3 are used to group users into the homogeneous groups.

After producing the individual's probabilistic state model the results of the data analysis are used to group users into homogenous groups. Users who are assigned into groups have also general characteristics which are derived from social media data. Such characteristics are age, sex, nationality, car ownership, employment, financial status and more. Those characteristics are used to connect users with the population within a study area since we know the characteristics of the population from census data. The outcome is the assignment of a PSM to each individual within a study area by placing him in a homogenous group. After that, the mobility demand in particular within a study area is predicted by aggregating the results of each individual's probabilistic state model.

The population of the study area, N, and the characteristics of users may be extracted from census data. A subset of the population that accurately reflects the characteristics of the members of the entire population is selected for the data analysis and the creation of homogenous groups. After that, for all users within the study area, a probabilistic state model is assigned according to their stratification into groups with the use of census data. Nevertheless, the assigned probabilistic state models to those users do not contain exact location information. The location values for each user within the study area are estimated later on for each probabilistic state model by combining the estimated activities of the user and location-related information from census (i.e., residential areas, industrial areas, retail areas, or more general land-use characteristics).

Alternatively, the location class is defined also with an aggregated geometrical area. This area is not representative of the exact location of the user location but rather of the aggregated area.

Probabilistic state model classification

For stratifying users into homogenous groups different distances can be used. Users whose model instances are closer according to the given distance will be most likely grouped in the same probabilistic state model class. In a preferred embodiment, the classification of users in different groups is performed by using the state probability of the user's PSM models:

where k is the user, / is the state, / is the time and w is the model class and w_w e {0,1} is the weight of the class. Classification Distance

For the classification, the distance between the state probabilities of different users may be used as indicators. The following three probability-distance methods are used as alternatives without limiting the integration between them:

· probability distance between different PSMs based on the Kullback-Leibler

(KL)divergence

• min-shifted KL-divergence

• geometric distance Distances can be weighted with weights defined externally and if P and P₂ are two probabilities their distance is: d₁₂ = d{P_lt P₂) = ^ w_td{P₁{t), P₂ {t)

t=i

, where T is the time sequence, w_t e {0,1} is a weight that depends on the time index, and P_\(t), P₂ (t) ^re the probabilities at a certain time index. The min-shift distance is defined as:

T

du' = w_td( ₁(t), ₂(t -r- T))

, where H is the max distance in time and could be set to 2. The PSMs distance based on the KL diverge is defined as

, where W_j e {0,1} is another weight which is set for the specific state geometric distance is defined as: d%°(t = d(p₁(t , p₂(t ) = w_i \P₁(i, t) - P₂(i, t)\^<

After that, a cluster procedure is applied and the representatives of each group are defined. Each probabilistic state model then will be assigned to the group it belongs.

In Fig. 4 a flowchart for a method for predicting mobility demand of a user is shown. In a first step V1 general user information are determined including user location information and user time information from provided user data of a plurality of users. In a second steps V2 detailed user information is determined by analyzing user interaction of the users from provided user data. In a third step V3 the determined general user information and the detailed user information is analyzed. In a fourth step V4 the user state probabilities and individual user transition state probabilities based on the analyzed user information are determined. In a fifth step V5 users of the plurality of users into one of homogenous groups based on determined user states and user transition state probabilities, wherein a homogenous group comprises representatives representing general user criteria are assigned. In a sixth step V6 the mobility demand of a user based on a representative user of one or more homogenous groups the user being assigned to and based on the user state probability and individual user transition state probability of the representative user is predicted. In summary the present invention enables associating activities to representative locations by using interaction types, topics and a user's mood derived from a data analysis and correlates location/activity and social media interaction together. The present invention further enables associating the user's mood, discussion topics and/or type of interactions with travel decisions. Further the present invention provides a probabilistic model able to combine information from social media with mobile networks like cell/user identification as well as other sensors/input data. The present invention further enables assigning individuals to homogeneous groups with probabilistic similarity-based models.

Even further the present invention enables profile information from social media to be used to selectively apply census data to mobility group models and forecast the mobility demand. The output of the mobility demand predicted data that may be used for deciding about development of new roads and/or public transport infrastructure, traffic light cycle optimization, road traffic control and implementation of new policies such as congestion charging, switching of working hours, dynamic tolling, dynamic public transport fares and dynamic lane changing and in general for road transport planning optimization.

Even further the present invention enables or provides

a. An apparatus that stating from social networking data and census data generates the mobility demand of a selected area b. Method and rule set for classification of user location c. Method and rule set for classification of user interaction (tweeting/chatting) activity

d. Method and rule set for state probability computation e. Method and rule set for state transition probability computation f. Method for classification of user model

g. Method for assignment of user model group on census data profile.

The present invention enables therefore an estimation of the mobility behavior of users, based on more detailed data provided from social media. Further the present invention enables mobility demand prediction by provisioning data for activity-based models after clustering of users and an estimation of user location and user state transition probabilities for homogeneous groups. The present invention enables further a continuous, inexpensive mobility demand prediction in particular by considering also the mobility behavior of users. The present invention has inter alia the following advantages: The present invention uses inexpensive, continuous public data extracted from social media and other available sensing mechanisms to provide mobility demand prediction. Even further the present invention is flexible and interoperable and is not bound to a particular city or region compared with conventional server-based methods and systems. Even further the present invention enables continuous short term mobility demand prediction since provided information from social media or other sensing mechanisms is continuously updated. Even further the present invention exploits special characteristics of social media data enabling a more detailed understanding of the mobility behavior of users including a development of enhanced user profiles which can be used also for non-transport related topics.

Many modifications and other embodiments of the invention set forth herein will come to mind the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

C l a i m s

A method for predicting mobility demand of a user,

characterized by the steps of

a) Determining general user information (V1 ) including user location information and user time information from provided user data of a plurality of users,

b) Determining detailed user information

(V2) by analyzing user interaction of the users from provided user data,

c) Analyzing the determined general user information and the detailed user information

(V3),

d) Determining user state probabilities and individual user transition state probabilities (V4) based on the analyzed user information,

e) Assigning users of the plurality of users (V5) into one of homogenous groups based on determined user states and user transition state probabilities, wherein a homogenous group comprises representatives representing general user criteria

f) Predicting the mobility demand of a user (V6) based on a representative user of one or more homogenous groups the user being assigned to and based on the user state probability and individual user transition state probability of the representative user.

The method according to claim 1 , characterized in that interaction of the user is determined by analyzing user related social media information, user related information of city-released apps, user road traffic information and/or user network traces.

The method according to one of the claims 1 -2, characterized in that general user characteristics are derived from census data and wherein this general user characteristics are used to relate the user for which mobility demand is to be predicted to a representative user of a homogenous group.

4. The method according to one of the claims 1 -3, characterized in that in case of missing general user information and/or detailed user information user state probabilities and individual user transition state probabilities are estimated.

5. The method according to one of the claims 1-4, characterized in that a user state is determined based on user location, user activity type, distance from a user's home, user interaction type, user emotion, topic of user interaction, time of user interaction and/or day type.

6. The method according to one of the claims 1-5, characterized in that the user transition state probability is based on the number of times switching from a first user state into a second user state at a certain time and a certain day type and the number of times switching from the first user state to any other user state.

7. The method according to one of the claims 1 -6, characterized in that user state changes are limited to discrete times.

8. The method according to one of the claims 5-7, characterized in that users from the plurality of users lying within a certain distance of a predetermined point are assigned to the same homogeneous group.

9. The method according to one of the claims 5-8, characterized in that for determining the user interaction duration and/or data exchange of the social media interactions of the user are determined.

10. The method according to claim 9, characterized in that for determining the user interaction the user interaction is classified into at least four interaction categories including photo upload, continuous interaction, chatting and reply.

1 1. The method according to one of the claims 5-10, characterized in that for determining the user emotion content of messages and/or of interactions of the user, preferably by counting positive emotion keywords against negative emotion keywords in the content, is analyzed.

12. The method according to one of the claims 5-1 1 , characterized in that for determining the topic of a user interaction, the context of a user message and/or of the user interaction is analyzed, preferably by searching for specific keywords associated with predefined topics.

13. The method according to one of the claims 5-12, characterized in that for determining the topic of a user interaction a user message and/or of the user interaction is analyzed with regard to publicity/privacy of the user message and/or of the user interaction, preferably by analyzing the source of the user message and/or of the user interaction.

14. The method according to one of the claims 5-13, characterized in that for determining the activities of a user, activities are associated with predetermined and representative locations of the user, wherein for each representative location one activity is assigned to.

15. The method according to claim 14, characterized in that the representative locations include at least four different types of representative locations including home location, work location, location of a fixed activity and location of a flexible activity.

16. The method according to one of the claims 14-15, characterized in that for determining a distance between two representative locations the haversine- function is used.

17. The method according to one of the claims 4-16, characterized in that for estimating the user state probabilities and individual user transition state probabilities a user switching time (SWITCH) being the travel time of a user between two different locations is determined based on the user activity type and/or the time of user interaction at the start location and/or the end location.

18. The method according to one of the claims 1 -17, characterized in that assigning a user into a homogeneous group is based on a probability distance between the user state probabilities of different users, preferably wherein the probability distance includes a time weighing parameter for the probability distance at different times.

19. The method according to claim 18, characterized in that for the probability distance is determined according to the Kullback-Leibler divergence, a minimum-shifted Kullback-Leibler divergence and/or a geometric distance.

20. The method according to one of the claims 18-19, characterized in that for determining the representatives for the homogeneous group a clustering procedure is performed on the determined probability distances.

21. The method according to one of the claims 1 -21 , characterized by the further step of g) predicting the mobility demand within a study area by aggregating the estimated mobility patterns of users within the study area.

22. A system for predicting mobility demand of a user, preferably for performing with a method according to one of the claims 1 -21

characterized by

general user means operable to determine general user information including user location information and user time information from provided user data of a plurality of users,

assigning means operable to assign users of the plurality of users into one of homogenous groups based on determined user states and user transition state probabilities, wherein a homogenous group comprises representatives representing general user criteria, and predicting means operable to predict the mobility demand of the user based on a representative user of one ore more of the homogenous groups the user being assigned to and the user state probability and individual user transition state probability of the representative user.