WO2018151672A1 - Trajectory analysis through fusion of multiple data sources - Google Patents

Trajectory analysis through fusion of multiple data sources Download PDF

Info

Publication number
WO2018151672A1
WO2018151672A1 PCT/SG2018/050006 SG2018050006W WO2018151672A1 WO 2018151672 A1 WO2018151672 A1 WO 2018151672A1 SG 2018050006 W SG2018050006 W SG 2018050006W WO 2018151672 A1 WO2018151672 A1 WO 2018151672A1
Authority
WO
WIPO (PCT)
Prior art keywords
location
determining
error
data
fixed antenna
Prior art date
Application number
PCT/SG2018/050006
Other languages
French (fr)
Inventor
Ying Li
Shixin LUO
The Anh Dang
Original Assignee
Dataspark Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to PCT/IB2017/050891 priority Critical patent/WO2018150227A1/en
Priority to IBPCT/IB2017/050891 priority
Priority to SGPCT/SG2017/050484 priority
Priority to PCT/SG2017/050484 priority patent/WO2018151669A1/en
Priority to SGPCT/SG2017/050485 priority
Priority to PCT/SG2017/050485 priority patent/WO2018151670A1/en
Application filed by Dataspark Pte. Ltd. filed Critical Dataspark Pte. Ltd.
Priority claimed from AU2018222821A external-priority patent/AU2018222821A1/en
Priority claimed from PCT/SG2018/050070 external-priority patent/WO2018151677A1/en
Publication of WO2018151672A1 publication Critical patent/WO2018151672A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/02Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using radio waves
    • G01S5/0205Details
    • G01S5/021Calibration, monitoring or correction
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/01Satellite radio beacon positioning systems transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/38Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system
    • G01S19/39Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system the satellite radio beacon positioning system transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/42Determining position
    • G01S19/48Determining position by combining or switching between position solutions derived from the satellite radio beacon positioning system and position solutions derived from a further system
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/02Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using radio waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/02Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using radio waves
    • G01S5/0257Hybrid positioning solutions
    • G01S5/0263Hybrid positioning solutions employing positioning solutions derived from one of several separate positioning systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services

Abstract

Estimating a location of a device at a particular point of time may incorporate one, two, or more different location data points. The location data points may be derived from communications networks, where there may be different mechanisms for determining location. As part of the location estimation, each cellular location in a cellular network may have a different error range associated with each cell, for example. The error range for each cell may be generated by collecting precise location data from Global Positioning System or other mechanism with high accuracy, and comparing that data to location data gathered from other sources. A database of error ranges for each cell and each location mechanism may be gathered and used to estimate the actual location of a device for a given time period.

Description

Trajectory Analysis Through Fusion of Multiple Data Sources

Cross Reference to Related Applications

[0001] This application claims benefit of and priority to PCT/IB2017/050891 filed 17 Feb 2017 by DataSpark, PTE, LTD entitled "Mobility Gene for Trajectory Data", PCT/SG2017/050485 filed 27 Sep 2017 by DataSpark, PTE, LTD entitled "Trajectory Analysis With Mode Of Transport Analysis", and PCT/SG2017/050484 filed 27 Sep 2017 by DataSpark, PTE, LTD entitled "Map Matching and Trajectory Analysis", the entire contents of which are hereby expressly incorporated by reference for all they teach and disclose.

Background

[0002] Mobility data is being gathered on a tremendous scale. Every cellular telephone connection to every mobile device generates some data about a user's location. These observations are being generated at an astonishing rate, but the sheer volume of the observations make the data difficult to analyze.

[0003] Mobility data can be generated by merely observing a location for a device connected to a wireless network. The wireless network may be a cellular network, but also may be any other network from which a device may be observed. For example, a WiFi router or BlueTooth device may passively observe nearby devices, and may note the device's various electronic identification or other signatures. In many cases, a device may establish a communications session with various network access points, which may indicate the device's location.

[0004] Many interesting uses come from analyzing mobility data. As merely one example, traffic congestion may be observed from aggregating mobility observations from cellular telephones.

[0005] As more and more uses for mobility data are developed, the complexities of analyzing and managing these large data sets are exploding. One issue is that the sources of the data, such as the telecommunications companies, may have obligations of privacy and anonymity, but there may be a large number of consumers of the data. The consumers may be a wide range of companies which may use the data in countless ways.

Summary

[0006] Estimating a location of a device at a particular point of time may incorporate one, two, or more different location data points. The location data points may be derived from communications networks, where there may be different mechanisms for determining location. As part of the location estimation, each cellular location in a cellular network may have a different error range associated with each cell, for example. The error range for each cell may be generated by collecting precise location data from Global Positioning System or other mechanism with high accuracy, and comparing that data to location data gathered from other sources. A database of error ranges for each cell and each location mechanism may be gathered and used to estimate the actual location of a device for a given time period.

[0007] Machine learning techniques may be applied to determining a mode of transportation for a trajectory of a sequence of user locations. The mode of

transportation, such as walking, bicycling, riding in a car or bus, riding in a train, or other mode, may be determined by creating a training set of data, then using classification mechanisms to classify trajectories by mode of transport. The training set may be generated by tracking then verifying a user's transportation mode. In some cases, a user may manually input or verify their transportation mode, while in other cases, a user's transportation mode may be determined through other data sources.

[0008] A trajectory may be derived from noisy location data by mapping candidate locations for a user, then finding a match between successive locations.

Location data may come from various sources, including telecommunications networks. Telecommunications networks may give location data based on observations of users in a network, and such data may have many inaccuracies. The observations may be mapped to physical constraints, such as roads, pathways, train lines, and the like, as well as applying physical rules such as speed analysis to smooth the data and identify outlier data points. A trajectory may be resampled or interpolated to generate a detailed set of trajectory points from a sparse and otherwise ambiguous dataset.

[0009] Mobility observations may be analyzed to create so-called mobility genes, which may be intermediate data forms from which various analyses may be performed. The mobility genes may include a trajectory gene, which may describe a trajectory through which a user may have travelled. The trajectory gene may be analyzed from raw location observations and processed into a form that may be more easily managed. The trajectory genes may be made available to third parties for analysis, and may represent a large number of location observations that may have been condensed, smoothed, and anonymized. By analyzing only trajectories, a third party may forego having to analyze huge numbers of individual observations, and may have valuable data from which to make decisions.

[0010] A visit mobility gene may be generated from analyzing raw location observations and may be made available for further analysis. The visit mobility gene may include summarized statistics about a certain location or location type, and in some cases may include ingress and egress travel information for visitors. The visit mobility gene may be made available to third parties for further analysis, and may represent a concise, rich, and standardized dataset that may be generated from several sources of mobility data.

[0011] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Brief Description of the Drawings [0012] In the drawings,

[0013] FIGURE 1 is a diagram illustration of an example embodiment showing an ecosystem with mobility genes. [0014] FIGURE 2 is a diagram illustration of an embodiment showing a network environment with systems for generating mobility genes.

[0015] FIGURE 3 is a flowchart illustration of an embodiment showing a method for collecting data by a telecommunications network.

[0016] FIGURE 4 is a flowchart illustration of an embodiment showing a method for requesting and responding to a customized mobility gene order.

[0017] FIGURE 5 is a flowchart illustration of an embodiment showing a method for generating and responding to a standardized mobility gene order.

[0018] FIGURE 6 is a flowchart illustration of an embodiment showing a method for generating a trajectory mobility gene.

[0019] FIGURE 7 is a flowchart illustration of an embodiment showing a method for preparing trajectory mobility genes for transmittal.

[0020] FIGURE 8 is a flowchart illustration of an embodiment showing a method for processing trajectories into visit mobility genes.

[0021] FIGURE 9 is a flowchart illustration of an embodiment showing a method for processing raw location observations into visit mobility genes.

[0022] FIGURE 10 is a diagram illustration of an embodiment showing a classification engine for transportation mode determination.

[0023] FIGURE 11 is a diagram illustration of an embodiment showing a network architecture for systems that classify trajectory segments by transportation mode.

[0024] FIGURE 12 is a flowchart illustration of an embodiment showing a method for acquiring users and building training data.

[0025] FIGURE 13 is a flowchart illustration of an embodiment showing a method for classifying location data to determine transportation mode.

[0026] FIGURE 14 is a flowchart illustration of an embodiment showing a method for generating training data from users.

[0027] FIGURE 15 is a diagram illustration of an embodiment showing a sequence of trajectory steps and calculations for them.

[0028] FIGURE 16 is a diagram illustration of a pair of embodiments showing cell sites and actual GPS location measurements taken within the site boundaries. [0029] FIGURE 17 is a diagram illustration of an embodiment showing a network architecture for analyzing errors and calculated trajectories.

[0030] FIGURE 18 is a flowchart illustration of an embodiment showing a method for raw data collection prior to error analysis.

[0031] FIGURE 19 is a flowchart illustration of an embodiment showing a method for analyzing errors for cell site location coordinates.

[0032] FIGURE 20 is a flowchart illustration of an embodiment showing a method for computing a trajectory.

[0033] FIGURE 21 is a flowchart illustration of an embodiment showing a method for computing an estimated location for a given step in a trajectory.

Detailed Description

[0034] Trajectory Analysis Through Fusion of Multiple Data Sources

[0035] A user's trajectory may be computed from multiple data sources, each of which may have different accuracies. In many cases, an accuracy may vary from one location to another. For example, location data derived from cellular telephony networks may have different accuracies or errors for each cell site or base station, and often from one antenna or cell attached to a base station to another antenna or cell attached to the same base station.

[0036] The differences in errors or accuracies may come from the design of the cellular networks, such as when one cell may be laid out to cover a much larger area than another cell. The smaller cell may be designed to cover an area that may be more densely populated than a larger cell. Consequently, a location data point gathered from the larger cell may be less accurate or have a higher error and location data gathered from the smaller cell. In some cases, the accuracies or errors may vary based on the geography, such as the interference or limitations due to high rise buildings, hills, or other obstructions.

[0037] To compute a user's location at a particular point of time, a more accurate estimation of the location may be determined by combining multiple data sources and determining a location. For example, location data may come from the location of a cell tower or antenna to which a device may communicate, as well as a triangulated location from two, three, or more antennas, as well as Global Positioning System (GPS) location, as well as WiFi data, and other sources. When multiple location data sources may be available at a particular point of time, the user's estimated location may be the intersection of each data source defined with an estimated error for each data source.

[0038] Some data sources may have small errors, which correspond to high accuracy. An example may be GPS location data, which may have an error range of single digit meters or feet. Other data sources, such as triangulated cellular locations, may have accuracies in the tens or hundreds of meters or feet. By overlapping the locations and a radius with the error range of a data point, a more accurate location estimation may be received.

[0039] When calculating a trajectory, each step in the trajectory may have different available data sources. A user's mobile device may use GPS only in certain situations, such as when the user might be using a navigation app. Other times, the GPS may be unavailable. Similarly, as a user traverses a cellular network, some cells may provide location data based on triangulation between multiple cells while other cells may provide location data that may only be the location of the cell. In the latter case, the cell size may be many hundreds of meters or ever kilometers or miles wide, meaning that the location data may have a potential error of on the order of kilometers or miles.

[0040] One method for calculating a trajectory may be Bayesian tracking or using a Kalman filter. Both such mechanisms use an error term to represent or estimate the confidence of the data being analyzed. Data with small error terms may be more reliable or more accurate than data with large error terms. For such analyses to perform well, an accurate estimation of error terms may improve overall accuracy.

[0041] A database of error terms may be determined by gathering high accuracy location data, such as GPS locations, that may be observed by mobile devices in a cell. For each GPS location, the corresponding cellular location or other, less accurate location data may be obtained. By comparing the highly accurate GPS location with the observed cellular location data, an estimate of the cellular location data may be computed. This database may be gathered over time to gather a specific error factor for each cell. [0042] For many trajectory calculations, highly accurate GPS location data may not be available, but less accurate cellular location data may be available. This may be because GPS receivers may consume battery power on a mobile device, so such services may not be used in all cases. However, the fixed network architecture may continuously gather location data using triangulation or other location mechanisms.

[0043] Throughout this specification and claims, the terms "error" and

"accuracy" are used to denote the amount of variance of a data point. Such terms may also denote the trustworthiness or reliability of a data point. For example in one embodiment, the term "error" may be represented as a radius or variance equivalent to an estimated standard deviation of observations. A larger error term may represent a lower accuracy, and vise versa.

[0044] Transportation Mode Determination Through Machine Learning Classification

[0045] A mode of transport for a user's trajectory may be analyzed using machine learning from a set of training data. Trajectory data may be sequential location data that contains a timestamp and location information, which may typically be a latitude and longitude.

[0046] A mode of transport may be very useful in classifying trajectories within location data. Once classified, further analysis may be performed on individual classes. For example, trajectories that may represent pedestrians may be very useful for retail store owners in high traffic areas, whereas nearby trajectories for passengers of a train system may not be useful. A retail store owner may be able to analyze the demographics of the pedestrians, as well as other traits or behaviors to adapt their retail offerings to match the pedestrians that walk past the store.

[0047] A journey may have several modes of transport. For example, a commuter in a big city may begin by walking to a bus station, taking a bus to a train station, then riding a train to a city center. The commuter may walk from a train station to their place of employment. In another example, a person in a more rural area may drive a car to a local shopping district, park the car, and walk around the shopping area. The person may continue by driving from one store to another before returning home. [0048] Trajectory data may be classified into modes of transport by building a set of training data, then applying machine learning and classification techniques to analyze data. The set of training data may include a set of location data containing latitude and longitude, a time stamp, and a mode of transport. The set of training data may be collected over a representative sample population, and then used to classify a set of unanalyzed location data.

[0049] The training set may be generated for a given area, such as a city. The factors that may be highly correlated for a given mode of transport may include the geography of the area and the speed of a person's movement. Densely populated pedestrian thoroughfares may generally have pedestrians and possibly bicyclists, but probably rarely train riders. Proximity to train stations and train tracks may indicate that a user may be traveling by train.

[0050] The training set may be generated by having a set of users move through a city or other area using their normal transportation modes, then capturing the

transportation modes for each journey. In some cases, a set of users may manually input their transportation modes, such as indicating when they may be walking or riding a bus.

[0051] Mobility Genes as Representations of Location Observations

[0052] Mobility genes may represent large numbers of location observations into a compact, meaningful, and easily digestible dataset for subsequent observations. The mobility genes may be one way for telecommunications service providers may aggregate and process their location observations into various formats that may be sold and consumed by other companies to provide meaningful and useful analyses.

[0053] The mobility genes may be a second tier of raw location data. Raw location data may come in enormous quantities, the volume of which may be

overwhelming. By condensing the raw location data into different mobility genes, the subsequent analyses may be much more achievable, while also maintaining anonymity of the users whose observations may be protected by convention or law.

[0054] Raw location data may be produced in enormous volumes. In modern society, virtually every person has at least one cellular telephone or other connected device. The devices continually ping with a cellular access point or tower, where each ping may be considered a location observation. In a single day in a medium sized city, billions of location observations may be collected.

[0055] Making meaningful judgments from these enormous datasets can be computationally expensive. In many cases, small samples of the larger dataset may be used to estimate various factors from the data.

[0056] By pre-processing the raw location observations into a set of mobility genes, a data provider may make these enormous datasets available for further analysis without the huge computational complexities. In many cases, the mobility genes may be anonymized, smoothed, augmented with additional data, and may be succinct enough and rich enough to make meaningful analyses without violating a telecommunications network's obligation of privacy to their customers. Further, the pre-processing of the data into mobility genes may transfer much of the computational cost to the data provider, which may unburden the data consumers from expensive data handling.

[0057] Mobility Gene for Trajectory Data

[0058] Location observations may be condensed into trajectory data that may be made available for various secondary analyses. Location observations may come from many different sources, including location observations made by telecommunications companies, such as cellular telephony providers, wireless access providers, and other communications providers.

[0059] The trajectory data may be useful for many different analyses, such as traffic patterns, behavioral studies, customer profiling, commercial real estate analyses, anomaly detection, and others. The trajectory mobility gene may condense millions or billions of location observations into a form that may be easily digested into meaningful analyses and decisions.

[0060] The mobility gene may represent a mechanism by which a data supplier may digest large numbers of observations into a dense, useful, and anonymous format that may be consumed by a third party. The third party may be a separate company that may further process the mobility gene into a decision-making tool for various applications.

[0061] By using a mobility gene, a data provider, such as a telecommunications service provider, may be able to pre-process large numbers of data into an intermediate format for further analysis. The mobility gene may be a format for making data available through an application programming interface (API) or some other mechanism.

[0062] The trajectory mobility gene condenses many location observations into a series of points or trajectories where a device was observed. This pre-processing may increase the value of the trajectory data, as well as make the trajectory data easier to analyze and digest. In many cases, the pre-processing may also attach various demographic information about the users associated with the trajectories.

[0063] The trajectories may be smoothed, which may be useful in cases where the observations may have location or time variations or tolerances. For example, many location observations may be made using an access point location or some form of triangulation between multiple access points. Such location observations may have an inherent level of tolerance or uncertainty, which may lead to trajectories that may be physically impossible, as the speed between each point may be unattainable using conventional transportation mechanisms.

[0064] Demographic information about the users may be added to the trajectory data. In many cases, a data provider may have secondary information about a user, such as the user's gender, actual or approximate age, home and work locations, actual or approximate income, family demographics, and other information. Such demographics may be associated with each trajectory, and may be used for supplying subsets of trajectories for third party analysis.

[0065] Trajectories may be anonymized in some cases. A user's trajectory may reveal certain personally identifiable information (PII) about a user. For example, a user's commuting trajectory may identify the user's home and work locations. With such information, a specific user may be identified. Anonymization of this data may be performed in several different ways.

[0066] One way to anonymize a trajectory may be to truncate the trajectory to omit an origin, destination, or both, while keeping a portion of a trajectory of interest. For example, a set of trajectories may be truncated to only show movement trajectories through a specific portion of a road or train station. Such truncations may omit the user's origin and destinations, but may give a third traffic analysis service meaningful and useful trajectories from which the service may show local traffic patterns. [0067] Another way to anonymize a trajectory may be to generalize or randomize an origin or destination of a trajectory. In many cases, a trajectory may have location observations with a certain accuracy range or tolerance. Such accuracy may help identify a person's home or other destination very specifically. One way to anonymize the trajectory may be to identify an origin or destination with a general area, such as a centroid of a housing district. All trajectories beginning or ending at the housing district may be assigned to be the centroid of the housing district, and thereby an individual trajectory cannot be used to identify a specific resident of the housing district.

[0068] Mobility Gene for Visit Data

[0069] A mobility gene for visits may be one mechanism to aggregate and condense location observations into an intermediate form for further analysis. A visit gene may represent summarized location data that reflect user behavior with respect to a certain location or location type.

[0070] The visit mobility gene may be derived from telecommunications observations and other sources, and may be an intermediate form of processed data that may be made available to third parties for analysis. In many cases, the visit mobility gene, as well as other mobility genes, may be made available for sale or consumption by third parties, and may be a revenue source for telecommunications companies and other companies that may gather location observations.

[0071] A visit mobility gene may represent a rich set of data that may be derived from location observations. In many cases, a visit mobility gene may represent movements relating to a specific location, such as a train station, store, recreational location, or some other specific location. In some cases, a visit mobility gene may represent an aggregation of visits to a specific type of location, such as a user's home, work, or recreational location.

[0072] A visit may be determined by a user's location observations being constant or within a certain radius for a period of time. In some cases, a visit may be derived by analyzing location observations to find all location observations that may be within a specific area, then analyzing user's behavior to determine if the users remained in the area for a period of time. In other cases, a visit may be derived by computing a user's trajectory and analyzing the trajectory for periods where the user's movements have stopped or remain within a small area. In such cases, a visit mobility gene may be a secondary analysis of a trajectory mobility gene.

[0073] A visit gene may include time of day, length of stay, and various other statistics. A visit gene may also include information before and after a person's visit. For example, a visit gene may include trajectories before and after a person's visit to a location. A visit gene may be supplemented with demographic information about visitors, such as actual or approximate age, gender, actual or approximate home and work locations, actual or approximate income, as well as hobbies, common other locations visited, and other information.

[0074] Throughout this specification, like reference numbers signify the same elements throughout the description of the figures.

[0075] In the specification and claims, references to "a processor" include multiple processors. In some cases, a process that may be performed by "a processor" may be actually performed by multiple processors on the same device or on different devices. For the purposes of this specification and claims, any reference to "a processor" shall include multiple processors, which may be on the same device or different devices, unless expressly specified otherwise.

[0076] When elements are referred to as being "connected" or "coupled," the elements can be directly connected or coupled together or one or more intervening elements may also be present. In contrast, when elements are referred to as being "directly connected" or "directly coupled," there are no intervening elements present.

[0077] The subject matter may be embodied as devices, systems, methods, and/or computer program products. Accordingly, some or all of the subject matter may be embodied in hardware and/or in software (including firmware, resident software, microcode, state machines, gate arrays, etc.) Furthermore, the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. [0078] The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.

[0079] Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by an instruction execution system. Note that the computer-usable or computer-readable medium could be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, of otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

[0080] When the subject matter is embodied in the general context of computer- executable instructions, the embodiment may comprise program modules, executed by one or more systems, computers, or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.

[0081] Figure 1 is an illustration showing an example embodiment 100 of an ecosystem with mobility genes. A mobile device 102 may connect to various access points 104, which may be managed by a network operator 106. Each communication with the mobile device 102 may be stored as raw location data 108.

[0082] A location data processor 110 may analyze the raw location data 108 to generate a set of mobility genes 112. The mobility genes 112 may be transferred to various analyzers 114, 116, and 118 for subsequent analysis. [0083] The location data processor 110 may process the raw location

observations into mobility genes 112, which may be sold or transferred to third parties who may perform various analyses. The mobility genes 112 may be a condensed, succinct, and useful intermediate data format that may be consumed by third parties while keeping user anonymity. In many cases, the location data processor 110 may augment the raw location data with secondary data sources, as well as provide smoothing and other processing that may increase data usefulness and, in some cases, improve data accuracy.

[0084] The various mobility genes 112 may be a standardized mechanism by which third party data analyzers may access a very rich and very detailed set of location data 108. A location data processor 110 may analyze billions of raw location

observations and distill the data into mobility genes 112 that may be easily consumed without the high data handling costs and high data processing costs of analyzing enormous numbers of location observations.

[0085] The mobility genes 112 may be an industrial standard format that may preserve user anonymity yet may be increase the value of specific data that may be used by third party analyzers. The mobility genes 112 may come in many formats, including trajectories and visits.

[0086] The mobility genes 112 may come in historical and real time data formats. A historical data format may include mobility genes that may have been derived over a relatively long period of time, such as a week, month, or year. A real time format may present mobility genes that may be occurring currently, or over a relatively short period of time, such as over a minute, hour, or day. Each use case and each system may have a different definition for "historical" and "real time." For example, in some systems, real time may be mobility genes derived in the last several seconds, while another system may define real time as data collected in the last week.

[0087] Real time data formats may be useful for providing alerts, providing current data, or making real time decisions about people's mobility. One use for real time data may be to display traffic congestion on a road or to estimate travel time through a city. Another use of real time data may be to predict the number of travelers that may be at a taxi stand in the next several minutes or in the next hour. [0088] Real time data formats may be used to compare current events to historical behaviors. Historical analysis may provide an estimate for events that may happen today or some period in the future, and by comparing historical estimates with real time data, an anomaly may be detected or an estimate for future traffic may be increased or decreased accordingly.

[0089] Figure 2 is a diagram of an embodiment 200 showing components that may analyze raw location data and provide mobility genes for subsequent analyses. The example of embodiment 200 is merely one topology that may be used to analyze raw location data.

[0090] The diagram of Figure 2 illustrates functional components of a system. In some cases, the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be execution environment level components. In some cases, the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances. Each embodiment may use different hardware, software, and interconnection

architectures to achieve the functions described.

[0091] Embodiment 200 illustrates a device 202 that may have a hardware platform 204 and various software components. The device 202 as illustrated represents a conventional computing device, although other embodiments may have different configurations, architectures, or components.

[0092] In many embodiments, the device 202 may be a server computer. In some embodiments, the device 202 may still also be a desktop computer, laptop computer, netbook computer, tablet or slate computer, wireless handset, cellular telephone, game console or any other type of computing device. In some embodiments, the device 202 may be implemented on a cluster of computing devices, which may be a group of physical or virtual machines.

[0093] The hardware platform 204 may include a processor 208, random access memory 210, and nonvolatile storage 212. The hardware platform 204 may also include a user interface 214 and network interface 216. [0094] The random access memory 210 may be storage that contains data objects and executable code that can be quickly accessed by the processors 208. In many embodiments, the random access memory 210 may have a high-speed bus connecting the memory 210 to the processors 208.

[0095] The nonvolatile storage 212 may be storage that persists after the device 202 is shut down. The nonvolatile storage 212 may be any type of storage device, including hard disk, solid state memory devices, magnetic tape, optical storage, or other type of storage. The nonvolatile storage 212 may be read only or read/write capable. In some embodiments, the nonvolatile storage 212 may be cloud based, network storage, or other storage that may be accessed over a network connection.

[0096] The user interface 214 may be any type of hardware capable of displaying output and receiving input from a user. In many cases, the output display may be a graphical display monitor, although output devices may include lights and other visual output, audio output, kinetic actuator output, as well as other output devices.

Conventional input devices may include keyboards and pointing devices such as a mouse, stylus, trackball, or other pointing device. Other input devices may include various sensors, including biometric input devices, audio and video input devices, and other sensors.

[0097] The network interface 216 may be any type of connection to another computer. In many embodiments, the network interface 216 may be a wired Ethernet connection. Other embodiments may include wired or wireless connections over various communication protocols.

[0098] The software components 206 may include an operating system 218 on which various software components and services may operate.

[0099] A raw location receiver 220 may receive raw location data from one or more networks 242 or other sources. The raw location receiver 220 may have a push or pull communication model with a raw location data source, and may receive real time or historical data for analysis. The raw location receiver 220 may store information in a raw location database 222.

[00100] A batch analysis engine 224 or a real time analysis engine 226 may route the raw location data 222 into various analyzers for processing. The analyzers may include a trajectory analyzer 228, a visit analyzer 230, and a statistics generator 232. The analysis may result in mobility genes 234, which may be served to various analyzers through a real time analysis portal 236 or a batch level analysis portal 238.

[00101] In the example of embodiment 200, a batch analysis engine 224 may analyze historical data to create historical mobility genes. The results of batch-level analysis may be available through a batch level analysis portal 238, where other analyzers may download and use mobility genes. A batch-level analysis may be analyses that may not have a real-time use case. For example, a commercial developer may wish to know the demographics of people who travel near a commercial shopping mall. Such an analysis may be performed in batch mode because the data may not be changing rapidly.

[00102] A real time analysis engine 226 may perform real-time analysis of location observations, and may be tuned to process data quickly. In many cases, the real time analysis engine 226 may generate comparison versions of a mobility gene. A comparison version may be a difference or comparison between a set of real time observations and a predefined, historical mobility gene. This difference may be useful for generating alerts, for example. In some cases, the difference information may be much more compact than having to access an entire set of mobility genes.

[00103] A trajectory analyzer 228 may create trajectories from raw location data 222. The trajectories may include sequences of locations traveled by a user, including timestamps for each of the observed locations. The trajectories may be processed into a useable form by scrubbing and smoothing the data, as well as removing duplicate or superfluous observations.

[00104] A visit analyzer 230 may identify visits for a given location. In some cases, the visits may be inferred or determined from subsequent analysis of trajectories. In other cases, visits may be identified by finding all location observations for a given location, then finding data associated with those visits.

[00105] A statistics generator 232 may generate various statistics for a given mobility gene. In some cases, the statistics generator 232 may access various static data sources 256 or real time or dynamic data sources 258 to augment a mobility gene.

[00106] The real time analysis portal 236 and batch level analysis portal 238 may be a computer or web interface through which data may be queried and received. In a typical use case, a third party analyzer may send a request to one of the portals 236 or 238 for a set of mobility genes. After verifying the requestor's credentials, the portal may cause the data to be generated if the mobility genes have not been calculated, then the mobility genes may be transmitted to the requestor.

[00107] The system 202 may be connected to various other devices and services through a network 240.

[00108] One or more telecommunications networks 242 may supply raw location data to the system 202. The telecommunications networks 242 may be cellular telephony networks, wireless data networks, networks of passive wireless sniffers, or any other network that may supply location information.

[00109] In a typical network, a wireless mobile device 244, which may have a Global Positioning System (GPS) receiver 246, may connect to with a

telecommunications network 248 through a series of access points. Various location data 250 may be generated from the mobile device interactions, including GPS location data that may be generated by the mobile device 244 and transmitted across the

telecommunications network 242.

[00110] The location data 250 may be cleaned and scrubbed with a data scrubber 252 to provide raw location data 254 that may be processed by the system 202. In many cases, the location data 250 may include device identifiers and other potentially personally identifiable information. The data scrubber 252 may replace device identifiers with other, non-traceable identifiers and perform other pre-processing of the location data.

[00111] One form of telecommunications location data may include location data that may be gathered from monitoring a device location in a cellular telephony system. In some such systems, the location data may include the location coordinates of an access point, which may be close to but not exactly the location of the device. Some cellular networks may have cells that span large distances, such as multiple kilometers or miles, and the accuracy of the location information may be very poor. Other

telecommunications systems may use triangulation between two, three, or more access points to determine location with a higher degree of accuracy. [00112] In some cases, a GPS receiver in a mobile device may generate coordinates and may transmit the coordinates as part of a data message from the mobile device 244. Such GPS coordinates may be much higher accuracy than other location mechanisms, but GPS coordinates may not be transmitted with as often as other location mechanisms. In some systems, some location observations may have different degrees of accuracy, such that some observations may be generated by GPS and other observations may be determined through triangulation or merely access point locations. Such accuracy differences may be used during mobility gene calculations.

[00113] Static data sources 256 and dynamic data sources 258 may represent any type of supplemental data sources that may be used to generate mobility genes. An example of a static data source 256 may be a map of highways, roads, train systems, bus systems, pedestrian paths, bicycle paths, and other transportation routes. Another example may be the name and location of various places of interests, such as shopping malls, parks, stores, train stations, bus stops, restaurants, housing districts, factories, offices, and other physical locations.

[00114] Another set of static data sources 256 may be demographic information about people. Such information may be known by a telecommunications network 242 because the network may have name, address, credit card, and other information about each of its subscribers. In some cases, a telecommunications network 242 may augment its raw location data 254 with demographic information.

[00115] An example of dynamic data sources 258 may be current train, bus, airplane, or ferry schedule, the current number of taxis available, or any other data source.

[00116] The static and dynamic data sources 256 and 258 may augment a mobility gene. For example, a data analyzer may request mobility gene information for fast food restaurants in a specific city. The system 202 may identify each of the fast food restaurants from a secondary data source, the identify visits and trajectories that may relate to each of the fast food restaurants.

[00117] A set of data consumers 260 may be third party organizations that may consume the mobility gene data. The data consumers 260 may have a hardware platform 260 on which various analysis applications 262 may execute. In some cases, the data consumers 260 may be third party services that may consume the mobility genes and provide location-based services, such as traffic monitoring and a host of other services.

[00118] Figure 3 is a flowchart illustration of an embodiment 300 showing a method of generating location observations. Embodiment 300 is a simplified example for a sequence of generating location observations that may be performed by a

telecommunications network.

[00119] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00120] Embodiment 300 illustrates two ways of determining a location observation, along with a way to scrub the observations from device-specific identifiers.

[00121] One way to create a location observation may be to detect a device on the network in block 302. A location for the device may be determined in block 304, along with a timestamp in block 306. The resultant location observation may be stored in block 308.

[00122] Each location may be determined by the network. In some cases, a network may establish an approximate location for the device, which may be sufficient for managing the traffic on the network. However, in many cases, such location coordinates may be inaccurate. For example, some networks may provide a location as the location of the access point, cell tower, or other fixed node on the network. Any device detected by that node may be located anywhere within the range of the access point, which may be several kilometers or miles. Such location information may have a large tolerance or variation from the actual location.

[00123] Some networks may provide a location estimate based on triangulation of a device with two, three, or more access points or other receivers. Such a location may be more accurate than the example of providing merely the access point physical location, but may not be as accurate as GPS location.

[00124] In block 310, a network may detect that GPS location information may be transmitted over the network. Such information may be captured, a timestamp generated in block 312, and a location observation may be stored in block 314. Such an example may be one method by which GPS information may be captured and stored as a location information.

[00125] In some systems, certain applications may execute on a device and may generate GPS location information. For example, navigation applications typically send a stream of GPS location data to a server, which may update directions for a user. Such applications may be detected, and the GPS locations may be used as highly accurate location observations.

[00126] A typical location observation may include a device identifier, a set of location coordinates, and a timestamp. The device identifier used in a wireless network may depend on the network. Typically, a device may have some type of electronic identification, such as a Media Access Control (MAC) address, Electronic Identification Number (EIN), or other device identifier. In many cases, such identifiers may be a mechanism by which other systems may also identify the device.

[00127] A device identifier may be one mechanism by which a mobility gene may be directly linked to a specific user. In general, the raw data for mobility genes may be collected by one group of actors who may have strict privacy regulations to which they have to adhere, but may sell mobility genes to a third party. A device identifier may be one way that a third party may connect specific mobility data to specific users.

[00128] In order to obfuscate identifiable information from the location observations, each observation may be analyzed in block 316, and a unique identifier for the device may be generated in block 318 and substituted for the actual device identifier in block 320. The location observation may be updated in block 322.

[00129] The unique identifier may be the same identifier for that device in the particular dataset being analyzed. In some cases, a lookup table may be created that may have the device identifier and its unique replacement. Such a system may use the same substituted device identifier for observations over a long period of time.

[00130] After updating all of the observations, the updates may be sent to a mobility gene analyzer in block 324.

[00131] Figure 4 is a flowchart illustration of an embodiment 400 showing interactions between a mobility gene provider 402 and a data consumer 404. The operations of the mobility gene provider 402 are illustrated in the left hand column, while the operations of the data consumer 404 are illustrated in the right hand column.

[00132] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00133] Embodiment 400 is one method by which a mobility gene may be requested and provided. A mobility gene provider 402 may be a system that may process raw location observations into a set of mobility genes. The mobility genes may be consumed by the data consumer 404. In many situations, the mobility genes may be a compact form of location observations that may be ready for further processing by a data consumer 404.

[00134] The mobility genes may represent many thousands, millions, billions, or even trillions of individual observations that may be condensed into various mobility genes. By pre-processing the location observations into a set of mobility genes, the high cost and complexity of analyzing enormous numbers of observations may be avoided. Further, a set of mobility genes may be anonymized or summarized such that the data may be handled without worry of disclosing personally identifiable information. Such restrictions may be imposed by law or convention, and the cost of implementing the restrictions may be borne by the mobility gene provider 402 and may not be passed to the data consumer 404.

[00135] In the example of embodiment 400, a data consumer 404 may define a mobility gene in block 406, then transmit that definition in block 408 to the mobility gene provider 402.

[00136] The mobility gene provider 402 may receive the definition in block 410, analyze raw location data in block 412, and create the mobility genes in block 414 and store the mobility genes in block 416.

[00137] In many cases, the mobility gene may be processed from historical data. Such mobility genes may be processed in a batch mode. Some requests may be for real time data, and such mobility genes may be continually processed and updated. [00138] In the example of embodiment 400, a data consumer 404 may request data in block 418, which may be received in block 420 by the mobility gene provider 402 in block 422. The mobility gene provider 402 may transmit the mobility genes in block 422, which may be received by the data consumer in block 424. The mobility genes may be analyzed in block 426 to provide various location based services in block 428.

[00139] The example of embodiment 400 in blocks 418-428 may be one example of a pull-style communication protocol, where the data consumer 404 may initiate a request. Other systems may use a push-style communication protocol, where the mobility gene provider 402 may initiate a data transfer. Still other systems may use other types of communication protocols for transferring mobility genes from a mobility gene provider 402 to a data consumer 404.

[00140] Figure 5 is a flowchart illustration of an embodiment 500 showing interactions between a mobility gene provider 502 and a data consumer 504. The operations of the mobility gene provider 502 are illustrated in the left hand column, while the operations of the data consumer 504 are illustrated in the right hand column.

[00141] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00142] Embodiment 500 is an example of an interaction where a data consumer 504 may use a standard, pre-computed mobility gene. A mobility gene provider 502 may analyze raw location data in block 506, create a standardized set of mobility genes in block 508, and store the mobility genes in block 510. Such a process may loop over and over as new data may be received.

[00143] A standardized set of mobility genes may be pre-defined and may be ready to use. One form of such genes may be a subscription service or a data

marketplace, where many different data consumers 504 may purchase or consume a predefined set of mobility genes.

[00144] Such a system may compare with the example of embodiment 400, where a data consumer may define various parameters about a requested mobility gene. [00145] A data consumer 504 may determine a standard mobility gene for an application in block 512. In many cases, a mobility gene provider 502 may provide a catalog of mobility genes that may be useful for various applications. Such mobility genes may be standardized and may be offered on a subscription or other basis to one or more data consumers.

[00146] The data consumer 504 may request mobility genes in block 514, and the request may be received in block 516 by the mobility gene provider 502. The mobility genes may be transmitted in block 518 and received in block 520. A data consumer 504 may analyze the mobility genes in block 522 and provide a location based service in block 524.

[00147] Figure 6 is a flowchart illustration of an embodiment 600 showing a method for creating trajectory mobility genes. The method of embodiment 600 may be merely one example of how trajectories may be created from raw location observations.

[00148] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00149] Embodiment 600 is one example of how trajectory mobility genes may be generated. A trajectory gene may define a path that a user may have traveled. In many cases, a trajectory gene may include a transportation mode.

[00150] Trajectory genes may be smoothed. In many cases, location

observations may not be very precise. For example, some raw location data may give a user's location as the location of an access point, which may be a large distance from the actual location. In some cases, such variation may be on the order of tens or hundreds of feet, or in some cases miles or kilometers of inaccuracies.

[00151] A smoothing algorithm may adjust a trajectory such that the movement may make physical sense. Some such smoothing algorithms may increase a trajectory's accuracy.

[00152] Some smoothing or post processing algorithms may adjust a trajectory as part of an anonymizing process. Trajectories can contain information that may identify people specifically. For example, a trajectory from a person's home address to their work address may indicate exactly who the person may be. By obfuscating one or both of the origin or destination, the trajectory may be made anonymous, while preserving useful portions of the trajectory for analysis.

[00153] Many mobility genes may include demographic information about a user. The demographic information may be any type of descriptor or categorization of the user. Many systems may classify users by gender, age or age group, income, race, education, and so on. Some systems may include demographics that may be derived from location observation data, such as predominant mode of transport, recreational sites visited, types of restaurants visited, and the like.

[00154] Raw location observations may be received in block 602.

[00155] A timeframe of interest may be determined in block 604. In some analyses, a time frame may be defined by trajectories in the last hour, day, or week. In other analyses, a time frame may be defined by trajectories at a specific recurring time, such as between 9: 15-9:30am on Tuesdays that are not holidays. Location observations meeting the timeframe of interest may be gathered for the analysis.

[00156] The observations may be sorted by device identification in block 606. For each device identification in block 608, a subset of observations may be retrieved in block 610 that have the device identification. The subset may be sorted by timestamp in block 612 and a raw trajectory may be created by the sequence of location observations in block 614.

[00157] For each sequence in block 616, the trajectory may be broken into segments based on the trajectory speed in block 618. In other words, a trajectory segment may be created by identifying locations where the trajectory may have paused for an extended time. An example may be a trajectory that may pause while a person is at work, at home, at a recreational event, or visiting some location.

[00158] For each segment in block 620, a transportation mode may be determined in block 622 and an average speed determined in block 624. The

transportation mode may be inferred by the specifics of a trajectory. For example, a person who progresses slowly at a walking pace to a train station, then moves quickly at a train's speed may be assumed to have walked to the train station and ridden a train. Another person who lingers at a bus stop for a period of time, then travels at a common speed of vehicular traffic may be assumed to be riding a bus. Yet another person who travels on a motorway but begins and ends a journey away from bus stops may be assumed to travel by car or taxi.

[00159] In some embodiments, a user' s previous history may be used as an indicator for their preferred transportation mode. Some systems may look back to previous transportation analyses for hints or indicators as whether a specific user often uses a car or train.

[00160] The following several steps may be one way to smooth the trajectory and, in some cases, increase its accuracy. Some location observations may have positional data that may be highly inaccurate. The inaccuracies may come from the method used to determine a user's location, which may include giving only the coordinates of an access point or cell tower, even though the user may be a long distance away from the access point or cell tower. In such cases, the trajectory information may give unrealistic movements, such as lingering for a period of time at one access point, then instantaneously moving a long distance to a second access point. Such movements are not physically possible, so by smoothing the trajectory, the trajectory may become more accurate and more useful for further analyses.

[00161] Once a transportation mode is determined in block 622, an average speed may be determined in block 624. The average speed may be calculated from the end points of a trajectory segment.

[00162] A baseline speed range for the travel segment may be determined from historical data in block 626. The baseline speed may be used as a comparison to determine whether the observed speeds appear appropriate. For each observation in block 628, a speed comparison may be made in block 630. If the speed appears appropriate in block 630, no changes may be made. If the speed does not appear to be appropriate in block 630, the observed location may be adjusted in block 632 to meet the speed limits determined from the historical data.

[00163] After analyzing each segment in block 620, descriptors may be added to each segment in block 634. The descriptors may include transportation mode, averages speed, and other metadata. Demographic information may be added in block 636 describing the user.

[00164] After analyzing each sequence in block 616, the trajectories may be stored in block 638.

[00165] Figure 7 is a flowchart illustration of an embodiment 700 showing a method for preparing trajectory mobility genes for transmittal. The method of embodiment 700 may be merely one example of how trajectories may be prepared for use.

[00166] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00167] Embodiment 700 may illustrate one method by which a request for trajectory mobility genes may be fulfilled. The fulfillment method may ensure that there may be a sufficient number of trajectories such that individual trajectories may not be separately identifiable. In some cases, the trajectories may also be obfuscated.

[00168] A request for trajectory genes may be received in block 702.

[00169] The request may define a physical area of interest in block 704. The physical area of interest may be a specific physical location, such as people traveling along a highway or people traveling towards a sporting event. In some cases, the physical area of interest may be a category, such as people going out to eat, where the category may define the destination as any restaurant.

[00170] A time frame of interest may be defined in block 704. The number of available trajectories that meet the physical location and time frame criteria may be determined in block 706. If the number is below a predefined minimum number of trajectories in block 708, the search parameters may be adjusted in block 710 to include additional trajectories.

[00171] The minimum number of trajectories may be selected for any of many reasons. In some cases, a minimum number of trajectories may allow a mobility gene to anonymize the data such that a single trajectory may not be individually identified. In many cases, a summarized demographic profile may be provided with the trajectories, and when a low number of trajectories may be provided, it may be possible to single out a trajectory as possibly belonging to an outlier in the demographic profile.

[00172] Another reason for using a minimum number of trajectories may be to ensure relatively accurate subsequent analyses. A small set of trajectories may give highly skewed results in some cases, and by having larger datasets, more meaningful results may be calculated with higher confidence intervals.

[00173] The trajectories meeting the criteria may be retrieved in block 714. For each trajectory in block 716, the trajectory origins or destinations may be obfuscated in block 718, and demographic data may be collected in block 720.

[00174] The obfuscation of the trajectory may be accomplished in several different methods. One way to obfuscate a trajectory may be by truncating a trajectory. One use case may be to use trajectories to determine the density of riders on a subway system. The density may be derived from the number of trajectories from one train station to the next, but the analyses does not need to include origin and destination. By truncating the trajectories to just the portion from one train station to the next, anonymity may be preserved.

[00175] One way to obfuscate a trajectory may be to summarize an origin or destination. A person may be personally identified when that person begins or ends their journey from their home address. In such cases, a trajectory may be anonymized by using a centralized location as a substitute for a home address. For example, a centralized location in a housing district may be substituted for a user's home address in their trajectory. Such a substitution may be made with a work address or some other origin or destination.

[00176] Another way to obfuscate a trajectory may be to truncate a trajectory at a common location near the origin or destination. For example, a person why may travel by subway to their home may have their trajectory truncated at the train station where they alight.

[00177] After analyzing all of the trajectories in block 716, the demographic data may be summarized for the group of trajectories in block 722. The mobility genes may be transmitted in block 724. [00178] Figure 8 is a flowchart illustration of an embodiment 800 showing a method for creating visit mobility genes from trajectory genes. The method of embodiment 800 may be merely one example of how visit genes may be created.

[00179] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00180] Embodiment 800 may be one example of how to create a visit mobility gene. A visit mobility gene may give various information and statistics about people's visits to certain locations. In some cases, a data consumer may wish to find information about people's visits to a specific location, such as a shopping mall, recreational venue, a specific coffee shop, or other location.

[00181] In other cases, a data consumer may wish to find information about people's visits to certain classes of locations, such as fast food restaurants, grocery stores, or some other category.

[00182] Embodiment 800 may be one way to identify visits from trajectories. In this method, places where a person's trajectory pauses or remains within a certain area may be considered visits. Once a visit may be identified, the visit may be matched to a known physical location, then the visit may be classified, and demographics may be added.

[00183] The operations of embodiment 800 may be an example of an analysis that may be performed any time a trajectory may be generated. In some systems, trajectory mobility genes may be constantly generated from recently generated data. As each trajectory may be created, a visit analysis such as embodiment 800 may be performed to identify, classify, and store visits in a database.

[00184] Trajectories may be received in block 802. For each trajectory in block 804, a period of little movement may be identified in block 806. The period of little movement may be analyzed in block 808 to determine a length of visit. If the visit does not exceed a minimum threshold in block 810, the visit may be ignored in block 812. [00185] When the visit exceeds a threshold in block 810, an attempt may be made to identify home or work location in block 814. The home or work location of a person may be visited very frequently, typically every day.

[00186] The home and work location of a person may be a special category of locations for several reasons. For example, many movement studies may involve people's movements to and from work or home. As another example, home and work locations may be a way to identify a trajectory as belonging to a specific person.

[00187] If a match for home or work is made in block 816, the visit may be marked as home or work in block 818. When the visit is not to home or work, an attempt may be made in block 820 to match the visit to a known location. If there is a match in block 822, the visit may be marked with the location in block 824.

[00188] The matching in block 820 may be to attempt to match a visit to a business, organization, physical feature such as a park, or some other metadata about a location. Such metadata may enrich the data stored for a visit. For example, a visit near a grocery store that takes 20 minutes or so may be classified as a visit to the grocery store. Such grocery store visits may be searched and aggregated into a visit mobility gene for further analysis.

[00189] The visit type and duration may be classified in block 826 and demographic information may be added in block 828. The visit mobility gene information may be stored in block 830.

[00190] Figure 9 is a flowchart illustration of an embodiment 900 showing a second method for creating visit mobility genes. The method of embodiment 900 may be merely one example of how visit mobility genes may be created from raw location observations.

[00191] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00192] Embodiment 900 may be another way of identifying and classifying visits as part of a visit mobility gene. In this method, a set of locations is given, and the raw observation data may be searched to find occasions where the location was visited. From these data points, various aspects of a visit mobility gene may be derived.

[00193] Raw location observations may be received in block 902, as well as a set of locations of interest in block 904.

[00194] For each location of interest in block 906, raw location observations meeting the location criteria may be found in block 908. The user identifications for those observations may be found in block 910.

[00195] For each user identification in block 912, a length of stay may be determined in block 914. If the stay does not exceed a minimum value in block 916, the visit may be ignored in block 918.

[00196] When the visit does exceed the minimum value in block 916, the demographic information about the user may be gathered in block 918.

[00197] An inbound trajectory may be calculated in block 920 and an outbound trajectory may be determined in block 922. The inbound and outbound trajectories may be useful to help understand visitor's movements before and after the visit.

[00198] In some cases, the visit information may be anonymized. For example, inbound and outbound trajectories may be truncated or otherwise obfuscated. The visit data may be stored in block 928.

[00199] Figure 10 is a diagram illustration of an embodiment 1000 showing a module that may classify trajectory segments by transportation mode. A group of users 1002 may generate location data 1006 that may be collected from various cell towers 1004 as the users 1002 travel by taxi 1008, bus 1010, bicycle 1012, train 1014, ferry 1016, walking 1018, or any other transportation mode.

[00200] The location data 1006 may be processed into a training set 1020. The training set 1020 may contain trajectory segments with known or verified transportation modes associated with the segments. The training set 1020 may be used by a classifier 1022 to analyze unknown trajectory segments and location data 1024 generated by unknown users 1026 to determine their transportation mode 1028. [00201] The transportation mode 1028 may be stored along with a trajectory gene for analysis. The transportation mode 1028 may be stored as metadata or some other type of data.

[00202] A verifier 1030 may analyze some or all of the transportation modes 1028 to determine if the classification was correct. When the classification may be verified, the training set 1020 may be updated.

[00203] Figure 11 is a diagram of an embodiment 1100 showing components that may classify trajectory data by transportation mode. The example of embodiment 1100 is merely one topology that may be used to analyze location data.

[00204] The diagram of Figure 11 illustrates functional components of a system. In some cases, the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be execution environment level components. In some cases, the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances. Each embodiment may use different hardware, software, and interconnection

architectures to achieve the functions described.

[00205] Embodiment 1100 illustrates a device 1102 that may have a hardware platform 204 and various software components. The device 1102 as illustrated represents a conventional computing device, although other embodiments may have different configurations, architectures, or components.

[00206] In many embodiments, the device 1102 may be a server computer. In some embodiments, the device 1102 may still also be a desktop computer, laptop computer, netbook computer, tablet or slate computer, wireless handset, cellular telephone, game console or any other type of computing device. In some embodiments, the device 1102 may be implemented on a cluster of computing devices, which may be a group of physical or virtual machines.

[00207] The hardware platform 1104 may include a processor 1108, random access memory 1110, and nonvolatile storage 1112. The hardware platform 1104 may also include a user interface 1114 and network interface 1116. [00208] The random access memory 1110 may be storage that contains data objects and executable code that can be quickly accessed by the processors 1108. In many embodiments, the random access memory 1110 may have a high-speed bus connecting the memory 1110 to the processors 1108.

[00209] The nonvolatile storage 1112 may be storage that persists after the device 1102 is shut down. The nonvolatile storage 1112 may be any type of storage device, including hard disk, solid state memory devices, magnetic tape, optical storage, or other type of storage. The nonvolatile storage 1112 may be read only or read/write capable. In some embodiments, the nonvolatile storage 1112 may be cloud based, network storage, or other storage that may be accessed over a network connection.

[00210] The user interface 1114 may be any type of hardware capable of displaying output and receiving input from a user. In many cases, the output display may be a graphical display monitor, although output devices may include lights and other visual output, audio output, kinetic actuator output, as well as other output devices.

Conventional input devices may include keyboards and pointing devices such as a mouse, stylus, trackball, or other pointing device. Other input devices may include various sensors, including biometric input devices, audio and video input devices, and other sensors.

[00211] The network interface 1116 may be any type of connection to another computer. In many embodiments, the network interface 1116 may be a wired Ethernet connection. Other embodiments may include wired or wireless connections over various communication protocols.

[00212] The software components 1106 may include an operating system 1118 on which various software components and services may operate.

[00213] A training set 1120 may contained verified transportation modes for trajectory segments. The verified transportation modes may come from a data collector 1122, which may interact with users to collect transportation mode data. A transportation mode analyzer 1124 may also create verified transportation mode information by using secondary data sources to verify transportation mode.

[00214] A data collector 1122 may operate with a sign up portal 1126 and a user database 1128 to manage an application that may collect transportation modes for the users. An application may operate on a user device 1148, which may have a hardware platform 1150 on which a location tracking application 1152 may operate with a user interface 1154.

[00215] The location tracking application 1152 may collect a user's location information, then ask the user to confirm their transportation mode. The location tracking application 1152 may obtain permission to track the user's location through a sign up portal 1126. As location information may be collected, a user may enter their transportation mode through the user interface 1154.

[00216] For example, a user may sign up to participate through the sign up portal 1126 and may be entered into a user database 1128. The user may download and install the location tracking application 1152 onto their user device 1148. As the user begins traveling, such as commuting to work, going to a recreational activity, going shopping, or traveling to another location, the user's motion may be captured in a series of location data. Typically, the location data may be location coordinates along with timestamps of each coordinate.

[00217] The user's location coordinates may be analyzed to identify trajectories. In many cases, a user's trajectory may be further analyzed to identify trajectory segments where each segment may represent a different mode of transportation. For example, a commute to work may include riding a bicycle to a train station, taking a train into a central business district, then walking to their final destination.

[00218] The data collector 1122 may collect the user's trajectory, then the transport mode analyzer 1124 may identify and separate the trajectory into separate segments. The user may be queried through the user interface 1154 to verify whether they were walking, riding a bicycle, riding a train, or some other transportation mode.

[00219] In some cases, the transport mode analyzer 1124 may make a guess or assumption about the transportation mode, then ask the user to verify the transportation mode. In our example, the user's speed while riding a bicycle may indicate that the user may be traveling faster than a pedestrian but less than a car or taxi. The assumption may be that the user rode a bicycle during the segment, and the user may be presented with a map showing their route, then solicit input to verify that they user was riding a bicycle. [00220] As the user verifies a trajectory segment, that segment and the classified transportation mode may be stored in the training data set 1120. Over time, the training data set 1120 may be populated with many hundreds, thousands, or even millions of classified trajectory segments. As the training data set 1120 is populated and updated, it may be published for use by a classifier system 1134.

[00221] A classifier system 1134 may be connected over a network 1132 to the device 1102. The classifier system 1134 may operate on a hardware platform 1136 and may analyze trajectory data 1138 using a classifier 1140 to compare to the training data set 1142. The classifier 1140 may compare a given trajectory segment to classify the segment with a transportation mode. In many cases, a classifier 1140 may determine a classification with a probability or closeness to a match.

[00222] A classifier system 1134 may operate in real time by classifying trajectories as those trajectories are captured from location data 1146 provided by a telecom network 1144. In other uses, a classifier system 1134 may operate in batch mode by analyzing historical trajectory segments that may be identified for analysis.

[00223] Some systems may have a data verifier 1130 which may compare the machine-classified trajectory segments with an alternate data source. For example, a user's trajectory may be classified as traveling by car or taxi. The user may be contacted afterwards to verify that the segment was indeed taken by car. If the user corrects the trajectory, such as by identifying the segment as by bus, the training data set may be updated accordingly.

[00224] Some systems may verify classification by accessing auxiliary or third party data. For example, a user may use a mass transit pass to travel by bus or train. Such transits may be cross referenced with the user's trajectory segments and the segments may be classified or verified using the auxiliary data.

[00225] Figure 12 is a flowchart illustration of an embodiment 1200 showing a method for acquiring users and building training data. The method of embodiment 1200 may be merely one example of how training data may be collected.

[00226] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00227] Embodiment 1200 may illustrate one method for collecting location data along with transportation mode data to create a training data set.

[00228] Users may be identified in block 1202 that may be interested in participating in a data collection operation. In many cases, users may be recruited and offered a discount, free items, or other incentives to participate.

[00229] For each user in block 1204, the user may be contacted and made an offer in block 1206. If the user does not elect to participate in block 1208, the user may be removed from the program in block 1210. Those that may elect to participate may also agree to have their locations tracked and may agree to answer questions about the transportation mode.

[00230] When a user opts in to participate in block 1208, an application may be downloaded in block 1212 and installed on the user's device. Location data may begin to be collected in block 1214 and the user may verify the transportation mode in block 1216.

[00231] As the users are contacted and begin data collection and verification in block 1204, a training data set may be assembled in block 1208 and published in block 1210.

[00232] Figure 13 is a flowchart illustration of an embodiment 1300 showing a method for classifying location data to determine transportation mode.

[00233] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00234] A training data set may be received in block 1302 along with raw location data in block 1304. In many cases, the raw location data may have device identifier associated with the data. [00235] For each device in block 1306, trajectories may be identified in block 1308. A trajectory may be a sequence of location coordinates with timestamps that shows movement of a device through a network, such as a telecom network.

[00236] Each trajectory may be analyzed in block 1310 and trajectory segments may be identified in block 1312. A trajectory segment may be a portion of a trajectory that may indicate a separate mode of transportation. For example, a trip may include driving to a location and walking from a parking structure to a final destination, or another trip may include walking to a bus stop, riding a bus for a period of time, and walking the remaining portion of a journey. Each trajectory segment may be analyzed in block 1314 to determine the transportation mode in block 1316.

[00237] The analysis of block 1316 may use machine classification techniques to compare a training data set to an unknown or new trajectory. The classification analysis may find a closest match between the training data set and the unknown trajectory segment, resulting in an estimated transportation mode. In many cases, such

classification engines may return an estimated classification match along with a probability or confidence indication.

[00238] Figure 14 is a flowchart illustration of an embodiment 1400 showing a method for generating training data from users.

[00239] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00240] Embodiment 1400 may represent the operations that may occur with a user to collect trajectory segment data, then query the user to verify the transportation mode associated with the trajectory segment.

[00241] Location data may be collected in block 1402. Location data may be coordinates with a timestamp from which trajectory segments may be identified in block 1404.

[00242] For each segment in block 1406, an attempt may be made in block 1408 to automatically determine a transportation mode. The automated attempt may be to compare a transportation segment to existing segments in a training data set to classify the segment.

[00243] In other cases, an automated attempt may use heuristics or other mechanisms to attempt to determine a transportation. One example of a heuristic may involve determining a maximum speed observed for the segment, then determining a subset of transportation modes. For example, a fast and sustained movement may eliminate walking or bicycle riding as candidate modes. In another example, some locations may indicate that the user may be traveling by subway, such as when a user's device may be detected inside a subway tunnel.

[00244] The user may be presented with options for the transportation mode in block 1410. When an automated determination may be made for the transportation mode, such information may be presented to the user to confirm or correct the transportation mode. When a suggested mode may be presented, the user may only need to confirm in most cases.

[00245] If the transportation mode may be not be correct in block 1412, the user may respond with the correct mode in block 1414. When the transportation mode is correct in block 1412, or when the user responds with the correct mode in block 1414, the transportation segment and mode may be added to a training set in block 1416.

[00246] The training set may be published for use by classification engines in block 1418.

[00247] Figure 15 is a diagram illustration of an example embodiment 1500 showing location estimation. With each location in a trajectory, multiple data sources may be combined to yield a predicted location. The predicted location may be a more accurate estimation of a device's location than if a single data source may have been used.

[00248] Location data gathered for mobile devices may come from multiple sources, including location coordinates gathered from wireless networks. Each data source may have different characteristics, such as accuracy or error ranges. Some sources may be more accurate than others, but some of the sources may not be available for each step of a trajectory. [00249] For example, many wireless networks may track the movement of a device by recording the cell, antenna, or other connection point. Such a data point may indicate that the device was within the range of the connection point and therefore such a data point may represent one set of coordinates from which a location may be estimated. In many cases, a wireless network may store the coordinates of the antenna, tower, or other device as an approximate location for the device. Such coordinates may be relatively inaccurate because the device may be anywhere inside the coverage area of the connection point.

[00250] Some networks may be able to triangulate the location of a device through two, three, four, or more connection points. Such coordinates may be significantly more accurate than using the raw location of the connection point.

[00251] The accuracy or error range of a given device may be affected by geography, weather, and other factors. For example, wireless signals may reflect off of buildings or may be obscured in some manner. Cells may be designed with different sizes and geometries based on the anticipated traffic as another example.

[00252] Global Positioning System (GPS) receivers may be found in many mobile devices. However, GPS receivers may not be used at all times due to the power consumption of the receivers. In many cases, GPS receives may be used only when specific applications may be executing on a device, such as when a user may be accessing a map application that may be providing directions for example. During other times, the GPS receiver may be turned off.

[00253] In the example of such a calculation may be illustrated in embodiment 1500. A location 1502 may be at time k. At time k+1, a set of coordinates x at 1504 may be illustrated. The x coordinates may be predicted coordinates based on the estimated speed and direction of the device. At the same time period, two other observations may be present, y at k+1 1506 and z at k+1 at 1508. Each of the observations may have different process noise 1510, 1512, and 1514. The process noise in this illustration is an example of an error range for the observations.

[00254] Each error range or process noise may give a relative value of the reliability of the observation. Observations with high error ranges or process noise may be less reliable than those with smaller error ranges or process noise. [00255] Taking into account the relative positions of the observations along with the error ranges or process noise, a calculated predicated location 1516 may be shown.

[00256] In the next step, k+2, the x observation 1518 is shown, along with the z observation at k+2 at 1520 and a at k+2 at 1522. The respective process noise is shown at 1524, 1526, and 1528. A calculated predicted location 1530 may be shown as well.

[00257] In a typical trajectory using cellular telephone data, one time period may have a cell tower location and a triangulated position available, while the next one may have GPS and a triangulated position available. Since GPS coordinates may be more accurate than many other location data sources, the accuracy or reliability of each trajectory location may vary from one time period to the next. In many cases, some time periods may be known with much more accuracy or reliability than others.

[00258] Figure 16 is a diagram illustration of an embodiment 1600 showing two illustrations 1602 and 1604 of cell sites.

[00259] The examples of embodiment 1600 are derived from actual observations of GPS coordinates taken while in contact with a cell site. In illustration 1602, the cell site 1606 is illustrated along with several GPS locations 1608. Similarly, illustration 1604 shows cell site 1610 and GPS observations 1612.

[00260] The illustrations show that within a cell site, there may be a large range of actual locations where a device may connect to the cell site. For the purposes of this discussion, the accuracy of the GPS data may represent the actual, physical location of devices that may be connected to the respective cell sites.

[00261] The examples show that if a cell site location were used as an observation for the location of a device, the device may be at any point within the service area of the cell. The service area, for these examples, may be inferred from the location of the GPS observations. This analysis may visibly show the approximate error range or accuracy when using cell site locations as the location coordinates of a trajectory.

[00262] Figure 17 is a diagram of an embodiment 1700 showing components that may analyze raw location data and produce trajectories. The example of embodiment 1700 is merely one topology that may be used to analyze raw location data. [00263] The diagram of Figure 17 illustrates functional components of a system. In some cases, the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be execution environment level components. In some cases, the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances. Each embodiment may use different hardware, software, and interconnection

architectures to achieve the functions described.

[00264] Embodiment 1700 illustrates a device 1702 that may have a hardware platform 204 and various software components. The device 1702 as illustrated represents a conventional computing device, although other embodiments may have different configurations, architectures, or components.

[00265] In many embodiments, the device 1702 may be a server computer. In some embodiments, the device 1702 may still also be a desktop computer, laptop computer, netbook computer, tablet or slate computer, wireless handset, cellular telephone, game console or any other type of computing device. In some embodiments, the device 1702 may be implemented on a cluster of computing devices, which may be a group of physical or virtual machines.

[00266] The hardware platform 1704 may include a processor 1708, random access memory 1710, and nonvolatile storage 1712. The hardware platform 1704 may also include a user interface 1714 and network interface 1716.

[00267] The random access memory 1710 may be storage that contains data objects and executable code that can be quickly accessed by the processors 1708. In many embodiments, the random access memory 1710 may have a high-speed bus connecting the memory 1710 to the processors 1708.

[00268] The nonvolatile storage 1712 may be storage that persists after the device 1702 is shut down. The nonvolatile storage 1712 may be any type of storage device, including hard disk, solid state memory devices, magnetic tape, optical storage, or other type of storage. The nonvolatile storage 1712 may be read only or read/write capable. In some embodiments, the nonvolatile storage 1712 may be cloud based, network storage, or other storage that may be accessed over a network connection.

[00269] The user interface 1714 may be any type of hardware capable of displaying output and receiving input from a user. In many cases, the output display may be a graphical display monitor, although output devices may include lights and other visual output, audio output, kinetic actuator output, as well as other output devices.

Conventional input devices may include keyboards and pointing devices such as a mouse, stylus, trackball, or other pointing device. Other input devices may include various sensors, including biometric input devices, audio and video input devices, and other sensors.

[00270] The network interface 1716 may be any type of connection to another computer. In many embodiments, the network interface 1716 may be a wired Ethernet connection. Other embodiments may include wired or wireless connections over various communication protocols.

[00271] The software components 1706 may include an operating system 1718 on which various software components and services may operate.

[00272] A trajectory engine 1720 may process a series of location coordinates into a trajectory. The trajectory may be a sequence of coordinates that may represent the approximate path by which a device may have been observed to have moved. In many cases, a trajectory may be a time series of coordinates, although some embodiments may or may not have a time series with a fixed time interval.

[00273] A trajectory engine 1720 may take data from multiple sources when calculating location coordinates for each time step. A set of default error values 1722 may be used as approximation of the error values for locations where more specific error data may not exist. When such error data may exist, the trajectory engine 1720 may use error data from location based services 1724 or from triangulation 1726. Such error databases may be calculated for individual cell sites or other segments.

[00274] The trajectory engine 1720 may receive a trajectory request from a trajectory requestor 1728, and may store the results in a database of analyzed trajectories 1730. [00275] An error analyzer 1732 may take raw observations and determine error estimations for location based services as well as triangulated location coordinates. Location based services may refer to location data that gives the location of a cell site or antenna to which a device may connect. Location bases services may be adequate for many location-consuming applications, but may not be as accurate as other data sources.

[00276] The error analyzer 1732 may compare GPS coordinates with the coordinates from a location based service or a triangulated location. In general, GPS coordinates may be received with an estimated error or tolerance, which may be significantly more accurate than location based services or triangulated coordinates.

[00277] By comparing the GPS locations to coordinates received from location based services or triangulation, an error factor may be calculated for specific cell sites or areas within a geography. Some cell sites or other areas may have very large error factors, while other areas may have smaller error factors. By calculating a predicted location with higher accuracy data, the confidence in the predicted location may be higher.

[00278] A network 1734 may be any type of communication network whereby device 1702 may communicate with a cellular network 1736 or other device 1752.

[00279] A cellular network 1736 may have a control infrastructure 1738 which may control several base station controllers 1740. Each base station controller 1740 may control several base stations 1742 and 1744. A mobile device 1746 is illustrated as communicating with base station 1744. The mobile device 1746 may have a GPS receiver 1748, which may generate relatively accurate location coordinates.

[00280] The network control infrastructure 1738 may collect raw location data 1750 for devices connected to the network. The error analyzer 1732 may analyze the raw location data 1750 to populate the error databases 1722, 1724, and 1726. The trajectory engine 1720 may use the raw location data 1750 to generate trajectories.

[00281] A device 1752 may illustrate any type of device operating on a hardware platform 1754 which may consume trajectories in any type of application 1756.

[00282] Figure 18 is a flowchart illustration of an embodiment 1800 showing a method for collecting raw data prior to error analysis. The method of embodiment 1800 may be merely one example of how raw data may be collected. [00283] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00284] Prior to determining error factors for individual cell sites, a network operator may collect raw data. The raw data may be based on highly accurate though not comprehensive GPS location coordinates. GPS location data may be created by applications running on a mobile device, and GPS location data may be identified by monitoring data traffic through the network. When GPS location is identified, the GPS coordinates may be stored with the available location data provided by the network. These data sets may be correlated into error factors for each cell site, as will be discussed in a later process.

[00285] Traffic monitoring may occur in block 1802. Telecommunications networks may routinely monitor data traffic for various administrative functions. When the data include GPS coordinates in block 1804, the cell site location coordinates may be retrieved in block 1806. The cell site location coordinates may be part of a location based service or other mechanism by which a network operator may track the devices that may be attached to the network. Additionally, if a set of triangulated location coordinates are available, such triangulated coordinates may be determined in block 1808.

[00286] The GPS coordinates may be stored in block 1810 along with the cell site coordinates and the triangulated coordinates. Such raw data may be processed using a method illustrated in the following figure.

[00287] Figure 19 is a flowchart illustration of an embodiment 1900 showing a method for processing raw GPS and other location data to determine error factors for individual cell sites and for triangulated location coordinates. The method of

embodiment 1900 may be merely one example of how raw location data may be converted into error factors for individual cell sites and triangulated coordinates.

[00288] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00289] Embodiment 1900 is one method for determining the error factors that may be associated with the location coordinates associated with individual cell sites. Each cell site may have different geometries, different geographies, different reflections or other physical obstacles, and other factors that may change the accuracy or reliability of location coordinates obtained when monitoring movement of devices in the area.

[00290] The method of embodiment 1900 generates error factors for individual cells in a cellular network. The method may operate under the assumption that GPS location coordinates are very accurate and represent the actual location of a device connected to the network inside a cell. By analyzing the GPS location coordinates within the cell, an error factor may be generated for each cell. The error factors may represent the accuracy, tolerance, or error that may be associated with location based services for that cell, or for triangulated location coordinates generated inside the cell.

[00291] Each cell site may be analyzed in block 1902, and for each cell site, each antenna or cell may be analyzed in block 1904. For each antenna or cell inside a cell site or tower, the GPS locations and the associated location based services and triangulated data may be analyzed. Such data may have been collected in the method of embodiment 1800.

[00292] A centroid of the GPS locations may be determined in block 1906, along with a standard deviation of those coordinates in block 1908. Using these data points, an error factor for the cell site may be generated in block 1910.

[00293] For triangulated location coordinates in block 1912, the GPS coordinates may be compared to the coordinates generated by triangulation to generate error factors for triangulated coordinates.

[00294] The error data may be stored in block 1914.

[00295] After processing all the cells for each cell site, the error factors may be aggregated in block 1916. A default error factor may be determined in block 1918 for an average cell site. A default error factor may be determined in block 1920 for the average set of triangulated coordinates. The values may be stored in block 1922. [00296] In some cases, there may not be enough data points to reliably calculate error factors for certain cells. An example may be a new cell that may be recently put into service and for which no GPS coordinates may be gathered. Another example may be for cells that have little traffic and for which few GPS coordinates may have been gathered. In such cases, a set of default error factors may be used when calculating a predicted location.

[00297] Figure 20 is a flowchart illustration of an embodiment 2000 showing a method for computing a trajectory using multiple data sources. The method of embodiment 2000 may be merely one example of how a trajectory may be computed.

[00298] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00299] Embodiment 2000 illustrates one method of computing a trajectory using multiple data sources. In a typical mobile network, the data sources may include location based services, triangulated coordinates, GPS coordinates, as well as other data sources.

[00300] A typical method for estimating motion may be to use a Kalman filter as well as Bayesian tracking. Such methods use error factors that may indicate the reliability or accuracy of a data point.

[00301] Information relating to a desired trajectory may be received in block 2002, which may include the raw location data for a device's movements in block 2004. The time segments for the trajectory may be determined in block 2006, and each time segment may be analyzed in block 2008.

[00302] For each time segment, all available location data sources may be determined for the device during that segment in block 2010. For each location data source in block 2012, the location coordinates may be determined in block 2014 as well as the error range in block 2016. All the available coordinates and error factors may be aggregated in block 2018 to generate a predicted location, which may be stored in block 2020. [00303] A more detailed method for such an analysis may be shown in a later figure.

[00304] After analyzing each time segment in block 2008, the trajectory may be stored in block 2022.

[00305] Figure 21 is a flowchart illustration of an embodiment 2100 showing a method for calculating estimated coordinates for a trajectory from multiple data sources. The method of embodiment 2100 may be merely one example of how estimated locations may be calculated.

[00306] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.

[00307] The process of embodiment 2100 may represent a method that may use as many as three different data sources to determine a predicted location for a time period in a trajectory. Other embodiments may use four, five, or more different data sources, or may substitute different data sources for the ones listed. As illustrated, a predicted location may be derived from a location based service, which may provide merely the coordinates for a cell tower or antenna. The location may also be determined from triangulated coordinates, which may be derived from two, three, or more cell sites that may triangulate a position for a device on the network. Additionally, GPS coordinates may also be used.

[00308] The cell site to which a device may be connected during the time period of interest may be identified in block 2102 and the coordinates of the cell site may be determined in block 2104.

[00309] The error range for the specific cell site may be looked up in block 2016, and if a customized error range is not present in block 2108, the process may use the default error range for all cell sites in block 2110. If the customized error range exists in block 2108, the calculated or customized error range may be used in block 2112.

[00310] A look up may occur in block 2114 for triangulated coordinates. If the triangulated coordinates do not exist for the device at the time period of interest in block 2116, the process may ignore triangulated coordinates in block 2118 and the process may proceed to block 2128.

[00311] When the triangulated coordinates exist in block 2116, a look up may be performed in block 2120 to determine whether calculated error ranges exist for the cell site for triangulated coordinates. If such an error range does not exist in block 2122, a default error range may be used in block 2124. If the calculated or customized error range does exist in block 2122, the calculated error range may be used in block 2126.

[00312] A look up may occur in block 2128 for GPS coordinates. If the GPS coordinates do not exist in block 2130, the GPS coordinates may be ignored in block 2132 and the process may proceed to block 2136.

[00313] If the GPS coordinates exist in block 2130, the GPS coordinates and error range may be used in block 2134. In many cases, GPS coordinates may be generated with a calculated error range for the specific GPS reading.

[00314] The estimated location may be calculated using all available data sources and either a customized or calculated error range for the specific data source and location, or using default error ranges in block 2136. The estimated location may be stored in block 2138 and if additional locations are available for processing in block 2140, the process may return to block 2102. When all locations have been processed in block 2140, the trajectory may be stored in block 2142.

[00315] The foregoing description of the subject matter has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the subject matter to the precise form disclosed, and other modifications and variations may be possible in light of the above teachings. The embodiment was chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and various modifications as are suited to the particular use contemplated. It is intended that the appended claims be construed to include other alternative embodiments except insofar as limited by the prior art.

Claims

What is claimed is: Trajectory Analysis Through Fusion of Multiple Data Sources
1. A method performed by at least one computer processor, said method comprising:
building an error table comprising:
identifying a first cell having a first fixed antenna;
for each of a plurality of mobile devices:
determining a precise location determined using one of said plurality of mobile devices in communication with said first fixed antenna;
determining a cellular location determined from a first base station attached to said first fixed antenna communicating with one of said plurality of mobile devices, said cellular location being determined with a first location mechanism;
determining an error by comparing said precise location to said cellular location; and aggregating said error for each of said plurality of mobile devices into a first antenna-specific error associated with said first fixed antenna;
determining a first estimated location for a second mobile device comprising:
determining that said second mobile device communicated with said first fixed antenna at a first time;
determining a first cellular location determined from said first base station attached to said first fixed antenna communicated with said second mobile device;
determining a first error range using said first error associated with said first fixed antenna.
2. The method of claim 1 further comprising:
building an error table further comprising:
identifying a second cell having a second fixed antenna;
for each of a second plurality of mobile devices:
determining a precise location determined using one of said second plurality of mobile
devices in communication with said second fixed antenna;
determining a cellular location determined from a second base station attached to said
second fixed antenna communicating with one of said second plurality of mobile devices; determining an error by comparing said precise location to said cellular location; and aggregating said error for each of said plurality of mobile devices into a second antenna- specific error associated with said second fixed antenna.
3. The method of claim 2 further comprising:
determining a second estimated location for said second mobile device comprising:
determining that said second mobile device communicated with said second fixed antenna at a second time;
determining a second cellular location determined from said second base station attached to said second fixed antenna communicated with said second mobile device;
determining a second error range using said second error associated with said second fixed antenna.
4. The method of claim 3, said second error range being different from said first error range.
5. The method of claim 4 further comprising:
determining a third estimated location for said second mobile device comprising:
determining that said second mobile device communicated with said third fixed antenna at a third time;
determining a third cellular location determined from said third base station attached to said third fixed antenna communicated with said second mobile device;
determining that no antenna-specific error is associated with said third fixed antenna; and determining a third error range using a default error factor.
6. The method of claim 5, said third error range being different from said second error range and different from said first error range.
7. The method of claim 1, said cellular location being a triangulated location calculated from a plurality of fixed antennae.
8. The method of claim 1, said cellular location being a location of said first fixed antenna.
9. The method of claim 1 further comprising:
building an error table by further comprising:
for each of a second plurality of mobile devices:
determining a second precise location determined using one of said second plurality of mobile devices in communication with said first fixed antenna;
determining a second cellular location determined from said first base station attached to said first fixed antenna communicating with one of said plurality of mobile devices, said second cellular location being determined with a second location mechanism, said second location mechanism being different from said first location mechanism;
determining a second error by comparing said second precise location to said second cellular location; and
aggregating said second error for each of said second plurality of mobile devices into a
second antenna-specific error associated with said first fixed antenna and further associated with said second location mechanism;
determining a second estimated location for said second mobile device comprising:
determining that said second mobile device communicated with said first fixed antenna at a second time;
determining a third cellular location determined from said first base station attached to said first fixed antenna communicated with said second mobile device, said third cellular location being determined using said first location mechanism;
determining a third error range using said first error associated with said first fixed antenna; determining a fourth cellular location determined from said first base station attached to said first fixed antenna communicated with said second mobile device, said fourth cellular location being determined using said second location mechanism;
determining a fourth error range using said second error associated with said first fixed
antenna and said second location mechanism; and
determining said second estimated location using said third cellular location and said third error range and said fourth cellular location and said fourth error range.
10. A system comprising:
at least one computer processor, said computer processor being configured to perform a method of:
building an error table comprising:
identifying a first cell having a first fixed antenna;
for each of a plurality of mobile devices:
determining a precise location determined using one of said plurality of mobile devices in communication with said first fixed antenna; determining a cellular location determined from a first base station attached to said first fixed antenna communicating with one of said plurality of mobile devices, said cellular location being determined with a first location mechanism;
determining an error by comparing said precise location to said cellular location; and aggregating said error for each of said plurality of mobile devices into a first antenna-specific error associated with said first fixed antenna;
determining a first estimated location for a second mobile device comprising:
determining that said second mobile device communicated with said first fixed antenna at a first time;
determining a first cellular location determined from said first base station attached to said first fixed antenna communicated with said second mobile device;
determining a first error range using said first error associated with said first fixed antenna.
11. The system of claim 10, said method further comprising:
building an error table further comprising:
identifying a second cell having a second fixed antenna;
for each of a second plurality of mobile devices:
determining a precise location determined using one of said second plurality of mobile
devices in communication with said second fixed antenna;
determining a cellular location determined from a second base station attached to said
second fixed antenna communicating with one of said second plurality of mobile devices; determining an error by comparing said precise location to said cellular location; and aggregating said error for each of said plurality of mobile devices into a second antenna- specific error associated with said second fixed antenna.
12. The system of claim 11, said method further comprising:
determining a second estimated location for said second mobile device comprising:
determining that said second mobile device communicated with said second fixed antenna at a second time;
determining a second cellular location determined from said second base station attached to said second fixed antenna communicated with said second mobile device;
determining a second error range using said second error associated with said second fixed antenna.
13. The system of claim 12, said second error range being different from said first error range.
14. The system of claim 13, said method further comprising:
determining a third estimated location for said second mobile device comprising:
determining that said second mobile device communicated with said third fixed antenna at a third time;
determining a third cellular location determined from said third base station attached to said third fixed antenna communicated with said second mobile device;
determining that no antenna-specific error is associated with said third fixed antenna; and determining a third error range using a default error factor.
15. The system of claim 14, said third error range being different from said second error range and
different from said first error range.
16. The system of claim 15, said cellular location being a triangulated location calculated from a plurality of fixed antennae.
17. The system of claim 10, said cellular location being a location of said first fixed antenna.
18. The system of claim 10, said method further comprising:
building an error table by further comprising:
for each of a second plurality of mobile devices:
determining a second precise location determined using one of said second plurality of mobile devices in communication with said first fixed antenna;
determining a second cellular location determined from said first base station attached to said first fixed antenna communicating with one of said plurality of mobile devices, said second cellular location being determined with a second location mechanism, said second location mechanism being different from said first location mechanism;
determining a second error by comparing said second precise location to said second cellular location; and
aggregating said second error for each of said second plurality of mobile devices into a
second antenna-specific error associated with said first fixed antenna and further associated with said second location mechanism;
determining a second estimated location for said second mobile device comprising:
determining that said second mobile device communicated with said first fixed antenna at a second time; determining a third cellular location determined from said first base station attached to said first fixed antenna communicated with said second mobile device, said third cellular location being determined using said first location mechanism;
determining a third error range using said first error associated with said first fixed antenna; determining a fourth cellular location determined from said first base station attached to said first fixed antenna communicated with said second mobile device, said fourth cellular location being determined using said second location mechanism;
determining a fourth error range using said second error associated with said first fixed antenna and said second location mechanism; and
determining said second estimated location using said third cellular location and said third error range and said fourth cellular location and said fourth error range.
PCT/SG2018/050006 2017-02-17 2018-01-05 Trajectory analysis through fusion of multiple data sources WO2018151672A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
PCT/IB2017/050891 WO2018150227A1 (en) 2017-02-17 2017-02-17 Mobility gene for trajectory data
IBPCT/IB2017/050891 2017-02-17
PCT/SG2017/050484 WO2018151669A1 (en) 2017-02-17 2017-09-27 Map matching and trajectory analysis
SGPCT/SG2017/050485 2017-09-27
PCT/SG2017/050485 WO2018151670A1 (en) 2017-02-17 2017-09-27 Trajectory analysis with mode of transport analysis
SGPCT/SG2017/050484 2017-09-27

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
AU2018222821A AU2018222821A1 (en) 2017-02-17 2018-01-05 Trajectory analysis through fusion of multiple data sources
PCT/SG2018/050070 WO2018151677A1 (en) 2017-02-17 2018-02-14 Real time trajectory identification from communications network
PCT/SG2018/050068 WO2018151676A1 (en) 2017-02-17 2018-02-14 Stay and trajectory identification from historical analysis of communications network observations
AU2018222826A AU2018222826A1 (en) 2017-02-17 2018-02-14 Real time trajectory identification from communications network
AU2018222825A AU2018222825A1 (en) 2017-02-17 2018-02-14 Stay and trajectory identification from historical analysis of communications network observations

Publications (1)

Publication Number Publication Date
WO2018151672A1 true WO2018151672A1 (en) 2018-08-23

Family

ID=63169820

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/SG2018/050006 WO2018151672A1 (en) 2017-02-17 2018-01-05 Trajectory analysis through fusion of multiple data sources
PCT/SG2018/050068 WO2018151676A1 (en) 2017-02-17 2018-02-14 Stay and trajectory identification from historical analysis of communications network observations

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/SG2018/050068 WO2018151676A1 (en) 2017-02-17 2018-02-14 Stay and trajectory identification from historical analysis of communications network observations

Country Status (1)

Country Link
WO (2) WO2018151672A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080188242A1 (en) * 2007-02-05 2008-08-07 Andrew Corporation System and method for optimizing location estimate of mobile unit
US20100007552A1 (en) * 2008-07-09 2010-01-14 Ntt Docomo, Inc. Positioning system, positioning method, and positioning program
US20110176523A1 (en) * 2010-01-15 2011-07-21 Huang Ronald K Managing a location database for network-based positioning system
CN101969692B (en) * 2010-10-14 2012-12-19 交通信息通信技术研究发展中心 Mobile phone locating method based on multiple shipborne mobile base stations
US20130170484A1 (en) * 2010-07-08 2013-07-04 Sk Telecom Co., Ltd. Method and device for discriminating positioning error using wireless lan signal
US20150065159A1 (en) * 2012-04-10 2015-03-05 Yaron Alpert Device, system and method of collaborative location error correction

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9037485B2 (en) * 2010-10-25 2015-05-19 Alohar Mobile Inc. Persistently determining and sharing user stays of a user of a mobile device
US9843895B2 (en) * 2014-05-30 2017-12-12 Apple Inc. Location-based services for calendar events
CN104751631B (en) * 2015-03-13 2017-03-01 同济大学 Based on fuzzy theory gps positioning and transportation trip chain determination method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080188242A1 (en) * 2007-02-05 2008-08-07 Andrew Corporation System and method for optimizing location estimate of mobile unit
US20100007552A1 (en) * 2008-07-09 2010-01-14 Ntt Docomo, Inc. Positioning system, positioning method, and positioning program
US20110176523A1 (en) * 2010-01-15 2011-07-21 Huang Ronald K Managing a location database for network-based positioning system
US20130170484A1 (en) * 2010-07-08 2013-07-04 Sk Telecom Co., Ltd. Method and device for discriminating positioning error using wireless lan signal
CN101969692B (en) * 2010-10-14 2012-12-19 交通信息通信技术研究发展中心 Mobile phone locating method based on multiple shipborne mobile base stations
US20150065159A1 (en) * 2012-04-10 2015-03-05 Yaron Alpert Device, system and method of collaborative location error correction

Also Published As

Publication number Publication date
WO2018151676A1 (en) 2018-08-23

Similar Documents

Publication Publication Date Title
Zheng Location-based social networks: Users
CN102394009B (en) Assessing road traffic conditions using data from mobile data sources
Shin et al. Unsupervised construction of an indoor floor plan using a smartphone
US7469827B2 (en) Vehicle information systems and methods
Bao et al. Recommendations in location-based social networks: a survey
US8972357B2 (en) System and method for data collection to validate location data
Parent et al. Semantic trajectories modeling and analysis
US8918278B2 (en) Method and system for modeling and processing vehicular traffic data and information and applying thereof
US8359156B2 (en) Map generation system and map generation method by using GPS tracks
Castro et al. From taxi GPS traces to social and community dynamics: A survey
Samaan et al. A mobility prediction architecture based on contextual knowledge and spatial conceptual maps
Mountain et al. Modelling human spatio-temporal behaviour: a challenge for location-based services
Stenneth et al. Transportation mode detection using mobile phones and GIS information
US8521593B2 (en) Methods and systems for providing mobile advertising using data networks based on groupings associated with internet-connectable devices
US8160805B2 (en) Obtaining road traffic condition data from mobile data sources
US8014936B2 (en) Filtering road traffic condition data obtained from mobile data sources
US20100008255A1 (en) Mesh network services for devices supporting dynamic direction information
Yan et al. Semantic trajectories: Mobility data computation and annotation
Qu et al. A cost-effective recommender system for taxi drivers
US20110029224A1 (en) Assessing road traffic flow conditions using data obtained from mobile data sources
Hoh et al. Achieving guaranteed anonymity in gps traces via uncertainty-aware path cloaking
Zheng et al. Urban computing: concepts, methodologies, and applications
US20070208501A1 (en) Assessing road traffic speed using data obtained from mobile data sources
US9589270B2 (en) Electronically capturing consumer location data for analyzing consumer behavior
CN105532030B (en) For analyzing the devices, systems, and methods of the movement of target entity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18754437

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

ENP Entry into the national phase in:

Ref document number: 2018222821

Country of ref document: AU

Date of ref document: 20180105

Kind code of ref document: A