GB2570853A

GB2570853A - Identifying sites visited by a user device

Info

Publication number: GB2570853A
Application number: GB1717321.2A
Authority: GB
Inventors: Alsehly Firas; Sevak Zankar
Original assignee: Sensewhere Ltd
Current assignee: Sensewhere Ltd
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2019-08-14
Also published as: GB201717321D0

Abstract

There is disclosed a method of facilitating the identification of sites visited by a user device within an indoor region. The indoor region includes a plurality of sites (e.g. shops, stores, offices) and transition regions (e.g. hallways or corridors), each transition region connecting two or more said sites. The indoor region is divided into sub-regions in dependence on aggregate sensor data received from sensors within the indoor region. The sensors may in particular be accelerometers, pedometers GPS, WiFi (RTM) positioning, camera, microphone magnetometer and a RF receiver or transceiver. Each sub-region is classified as a site or a transition region, and for each classified site, identification data representing the identity of the site is generated. An embodiment describes use within a shopping centre (or mall), for determining shop locations.

Description

IDENTIFYING SITES VISITED BY A USER DEVICE

Field of the invention

The invention relates to method of facilitating the identification of sites visited by a user device within an indoor region, a method of identifying such sites, a computer readable medium and a processing system.

Background to the invention

Increasingly it is desirable to determine when a user has visited one particular site (such as a shop) amongst several others within an indoor region (such as a shopping centre or mall). Store Visit Confirmation (SVC) is a particular type of data relating to site visits which can be desirable to generate. Within indoor regions, positioning systems either do not work in general (for absolute positioning systems such as GPS) or have significant limitations (for Wi-Fi positioning and similar). One approach is to measure position intersection with store geometry over a time window. This is commonly used by solutions conducting site surveys, to collect training data specific to one venue, while depending on indoor maps to identify store geometry. However, both maps and training data suffer from aging inaccuracies and, if not updated frequently, results are expected to degrade exponentially over time.

Other types of solutions overcome the requirement for geo-spatial reference of indoor maps by modelling signal descriptive profiles (also known as store fingerprints) collectively from inputs made by end users to identify a store, also known as ‘check ins’. These solutions rely on matching Wi-Fi fingerprints to a set of profiles with very limited dependency on location data compared to other solutions. However, due to being completely in signal space, location history cannot be used to improve accuracy, and this method is highly dependent on a device being in a store for a long time and without having store signal profiles overlapping. In practice, such profiles often do overlap, leading to difficulty resolving the ambiguity.

The present invention seeks to address deficiencies in the prior art and, in particular, to provide a system for facilitating the collection of geo-fence (boundary) data relating to points of interest.

Summary of the invention

A first aspect of the invention provides a (typically computer implemented) method of facilitating the identification of sites visited by a user device within an indoor region, the indoor region including a plurality of sites and at least one transition region, each transition region connecting two or more said sites, and the method comprising generating sub-region data representing a division of the indoor region into a plurality of sub-regions, preferably in dependence on aggregate sensor data received from sensors within the indoor region (or in dependence on other sensor data, or other types of data, obtained within the indoor region, or otherwise relating to the indoor region). The method may (optionally) further comprise classifying each sub-region as a site or a transition region. The method may yet further (optionally) comprise generating identification data representing the identity of sites corresponding to subregions that have been classified as sites. The method may optionally include outputting (or otherwise accessing or processing) the sub-region data and identification data.

The term ‘aggregate’ implies merely having a plurality of data items, although it may optionally connote data combined from any one or more of a plurality of sessions, paths, devices, users, and so on. References herein to a plurality of sites, transition regions, and so on may alternatively refer to one, or at least one, such entity, where appropriate and applicable. References to indoor may include a single indoor space, a combination of indoor spaces, a mix of indoor and outdoor space, and so on. Generating identification data may connote one or more of collating, processing and/or receiving identification data, for example from a plurality of sources of information, either stored locally or accessed or received remotely, but may also or alternatively involve creating the identity information and/or information data set. The term ‘generating’ in this context may in particular be replaced by ‘accessing’, ‘collating’, ‘processing’ or ‘preparing’.

By any or all of these method steps, data is provided which can facilitate the identification of sites visited by a user, in some cases with zero calibration and an unsupervised deployment, with no impediment to using the location history from sessions. In addition, store visits can be identified even if store point of interest (POI) information is not available. The classification of sub-regions into sites (being subregions of particular interest) or transition regions can allow computational and data gathering efforts to be focussed on areas of interest, and provides a more natural fit to real world data and behaviours, which also in general differ according to the above-mentioned classification. Furthermore, by use of the present method, sites can be identified with relatively high confidence levels in situations where insufficient mapping data is available to define the geographical extent of each site with a similarly high confidence (and, similarly, there may be insufficient signal source data to allow a confident disambiguation between different sites and transitional regions).

Said sub-region data may further comprise fingerprint data to facilitate identification of each sub-region in dependence on sensor data received within said sub-region. The fingerprint data may be defined only in relation to sub-regions classified as sites (or predominantly, in preference to transitional regions either in terms of relative quantity or quality of data, in terms of number of data types, or otherwise).

The fingerprint data for each sub-region may be generated in dependence on sensor data received within the respective sub-region (or merely relating to each sub-region, for example received from a third party, other external source, or observed remotely).

The fingerprint data may be derived from sensor data including at least one of signal source identifiers, such as MAC addresses and uuids, received signal strength, magnetic field strength and direction, visual references, and audio references, and so on. Other sensor data types are possible as appropriate.

The method may further comprise receiving an identity of a site, determining a subregion associated with the side, and adding the identity and associated sub-region to the identification data. The identity may be inputted by a user who has visited the site, or otherwise (for example generated or allocated automatically, by a user device or otherwise, or otherwise inputted, for example as part of a survey, map service or commercial directory). The sub-region associated with the identity may be determined with reference to sensor data received by a user device associated with the user inputting the identity, for example cross-referencing with fingerprint data as mentioned elsewhere. The method may in particular include receiving data relating to a ‘check in’, made manually by a user or otherwise, and may include identifying a relevant sub-region and associating the check in data with that sub-region.

The method may further comprise identifying boundaries between sub-regions in dependence on discontinuities in the aggregate sensor data corresponding to transitions between transition regions and sites. In particular, discontinuities may be determined in received signal strengths, for example from one or more Wi-Fi hubs or other signal transmitters (such as mobile telephone towers, Bluetooth(RTM) beacons, cordless telephone transmitters, and so on). A discontinuity may be determined at local maxima or minima of signal strength, which may be tracked in relation to more than one signal source (for example using signal identifiers to keep track of the sources). A confidence level may be associated with each detected discontinuity, allowing multiple data sources to be processed in combination to provide a more accurate estimate of transition points.

In a preferred embodiment, the sites are shops and the indoor region is a shopping centre (a single building, a collection of buildings, or otherwise). Alternatively or additionally, each site may be more generally premises of some sort, including commercial and/or non-commercial premises, for example, including but not limited to municipal sites such as libraries, council offices, and so on. In this case, store visit confirmation, SVC, data may be generated for each site entered. The SVC data may be transmitted to a third party, stored for later processing, aggregation or transmittal, and so on.

At least one site may include a plurality of rooms. In the case of a store or shop, for example, various rooms may be interconnected to various degrees, but are more likely to share signal sources than unrelated rooms (for example if Wi-Fi boosters or the like are provided to ensure coverage throughout the site). The method may further comprises a step of determining which room or part of a room the user device is in, for example using visual, ladar or other appropriate methods.

The aggregate sensor data may include data from at least one of: a movement sensor, electromagnetic signal sensor, acoustic signal sensor and magnetic field sensor, and may in particular include data from at least one of: an accelerometer, pedometer dead reckoning system, global positioning system, Wi-Fi positioning system, camera, microphone, magnetometer, and radio frequency receiver or transceiver.

Classifying each sub-region as a site or a transition region may comprise processing aggregate sensor data to determine at least one common user behaviour in the subregion, and classifying the sub-region in dependence on the user behaviour. For example, it may be determined that users typically move rapidly through a sub-region without stopping, indicating a transition area, and users lingering in other areas, indicating sites. Other data, including the aforesaid identification data may be used to assist in classifying the sub-regions.

The method may further comprise receiving path data representing a path taken by a user device through the indoor region, and the sub-region data may consequently be generated in dependence on the path data. The path data may represent a plurality of paths taken by at least one user device, and the method may consequently further comprise combining the paths to determine an aggregate path. In that case, at least one sub-region relating to a transition region may be defined in dependence on at least a portion of the aggregate path.

The method may further comprise receiving additional sensor (or other) data, and updating the sub-region data in dependence on the additional sensor (or other) data.

At least one of the steps of generating sub-region data in respect of at least one subregion, classifying at least one sub-region, and generating identification data for at least one sub-region may be carried out on a user device (preferably a user device which has captured or is capable of capturing at least part of the aggregate sensor data). Alternatively or additionally, at least one of the steps of generating sub-region data, classifying the sub-regions, and generating the identification data may be carried out on a server in communication with a plurality of user devices (preferably such devices having sensors as aforesaid).

The method may further comprise transmitting sensor data from a user device to the server, and optionally transmitting data back from the server to the user device, for example to provide updated data of various types as aforesaid, and the method may include transmitting differences and updates rather than whole data sets.

In a related aspect of the invention, there is provided a method of identifying sites visited by a user device within an indoor region, the indoor region including a plurality of sites and at least one transition region, each transition region connecting two or more said sites, and the method comprising: receiving user sensor data from sensors in the user device; optionally accessing sub-region data representing a division of the indoor region into a plurality of sub-regions in dependence on aggregate sensor data received from sensors within the indoor region, each sub-region being optionally classified as a site or a transition region. The method may further include processing the user sensor data and the sub-region data to determine at least one candidate sub-region within which the user device may be located; and optionally (see below) accessing identification data, representing the identity of sites corresponding to subregions that have been classified as sites, to determine at least one candidate identity of the site associated with said at least one candidate sub-region. The method may further include outputting said at least one candidate identity. Alternatively, the method may alternatively comprise outputting said at least one candidate sub-region (instead of or as well as a candidate identity obtained by any means).

Accordingly, in a related aspect of the invention there is provided a method of identifying sites visited by a user device within an indoor region, the indoor region including a plurality of sites and at least one transition region, each transition region connecting two or more said sites, and the method comprising: receiving user sensor data from sensors in the user device; accessing sub-region data representing a division of the indoor region into a plurality of sub-regions in dependence on aggregate sensor data received from sensors within the indoor region, each subregion being classified as a site or a transition region; optionally processing the user sensor data and the sub-region data to determine at least one candidate sub-region within which the user device is located; and yet further optionally outputting said at least one candidate sub-region.

The method may further comprise determining said at least one candidate sub-region in dependence on a trajectory followed by the user device. The method may additionally or alternatively comprise determining said at least one candidate subregion by matching the user sensor data with fingerprint data associated with the subregion. The method may yet further comprise determining said at least one candidate sub-region in dependence on at least one of the bearing or orientation of the user device, the speed of the user device, the rate of change of the bearing, orientation or speed of the device, the distance travelled by the user device, the acceleration of the user device, and pattern of movement of the device, or any other appropriate property of the user device or surrounding environment.

The method may further comprise outputting a probability that the user device is within a said candidate sub-region. The method may yet further comprise outputting a location history of the user device prior to entering said at least one candidate subregion in any appropriate form.

The sensor data may be collected over a period of time, which may correspond to some or all of the time the user device is located within the indoor region, or additional time (for example prior to entering the indoor region and/or after the device leaves the region). The sensor data may be collected on a regular or irregular basis, and the user device may aggregate the sensor data locally in the meantime or otherwise. The period of time may be in excess of the average time taken to traverse transition regions and/or complete transactions at the sites, and may thus or otherwise be selected to be in excess of the average time taken to traverse subregions.

The sensor data may be collected while the user device follows a path through a plurality of locations. The path may traverse a plurality of the sub-regions. The sensor data may be collected in a plurality of sessions, or in respect of a plurality of paths or in respect of a plurality of discrete time periods, and/or in respect of a plurality of user devices.

The method may further comprise processing the user sensor data to detect transition points in the path, such as a transition between a transition region and a site. This feature may be provided independently of the path feature.

Other aspects of the invention (in combination with methods as aforesaid or otherwise) may include features including:

• The steps of 1. Probability of a store visit; 2. Outlier detection through store feature verification; and 3. Generation of final SVC result.

• Sensor data preferably includes data that may be used to determine a position of a user or other device and, accordingly, it is envisaged that the method may be used with a set of location data preferably including a history of locations where a user or other device was determined or estimated to be (using absolute or relative positioning systems, for example). Such sensor data may be collected other than at the device being positioned, and may be collected by mobile phone towers, WiFi hubs, and so on, and used in triangulation or other remote positioning systems. To simplify the above considerations, it is envisaged that the invention may specifically relate to processing at least one of a location history data, plurality of locations of a user device, plurality of points on a path taken by a user (or other device), and so on, instead of processing sensor data as mentioned above (either aggregate sensor data and/or user sensor data), and appropriate changes may be made to any or all of the statements of invention above. This feature may be provided independently in additional forms, in any combination with any of the aforesaid optional features.

• All of the methods above may alternatively or additionally operate on historic data, that is to say references to determining where a user device is, or in which sub-region it may be located, can be replaced or supplemented as necessary or appropriate by references to what a user device was or has been, and similar for any other entities which may be positioned using sensor or other data.

• The present invention also extends to apparatus features, such as a user device and/or a server having at least one processor and associated memory storing computer program code for causing the processor to carry out a method as aforesaid, or a portion of a method as aforesaid (for example with the computer program code distributed across a server and a user device such that the two together provide all steps of a method as aforesaid). The computer program code and/or any associated data can be transmitted to the user device, for example in response to a push or a pull request, and can be modified during use, for example, to take into account changes in processing steps or changes in the sub-region data or identification data, and similar.

In another aspect of the invention there is provided a non-transitory computer readable medium tangibly embodying computer program code which, when executed by one or more computer processors, causes the computer to carry out a method as aforesaid.

In a further aspect of the invention there is provided a computer processing system comprising one or more computer processors (typically hardware processors, such as microprocessors or microcontrollers) and associated memory, the memory tangibly embodying computer program code which, when executed causes the computer to carry out a method as aforesaid.

Although the embodiments of the invention described herein with reference to the drawings may comprise computer-related methods or apparatus, the invention may also extend to program instructions, particularly program instructions on or in a carrier, adapted for carrying out the processes of the invention or for causing a computer to perform as the computer apparatus of the invention. Programs may be in the form of source code, object code, a code intermediate source, such as in partially compiled form, or any other form suitable for use in the implementation of the processes according to the invention. The carrier may be any entity or device capable of carrying the program instructions.

For example, the carrier may comprise a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc, hard disc, or flash memory, optical memory, and so on. Further, the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via electrical or optical cable or by radio or other means. When a program is embodied in a signal which may be conveyed directly by cable, the carrier may be constituted by such cable or other device or means.

Although various aspects and embodiments of the present invention have been described separately above, any of the aspects and features of the present invention can be used in conjunction with any other aspect, embodiment or feature where appropriate. For example apparatus features may where appropriate be interchanged with method features. References to single entities should, where appropriate, be considered generally applicable to multiple entities and vice versa. Unless otherwise stated herein, no feature described herein should be considered to be incompatible with any other, unless such a combination is clearly and inherently incompatible. Accordingly, it should generally be envisaged that each and every separate feature disclosed in the introduction, description and drawings is combinable in any appropriate way with any other unless (as noted above) explicitly or clearly incompatible.

Description of the Drawings

An example embodiment of the present invention will now be illustrated with reference to the following figures in which:

Figure 1 is a schematic block diagram of a mobile telecommunications device in communication with a server computer;

Figure 2 is an illustration of a basic geo-fencing scheme;

Figure 3 is an illustration of the use of elevation in a basic geo-fencing scheme;

Figure 4 is a schematic of a system for training point of interest (POI) classification models for use in a first embodiment;

Figure 5 is a schematic of a system for using the classification models of Figure 4 to identify POI boundaries;

Figure 6 is a schematic of a system for validating the POI boundaries identified in Figure 5;

Figure 7a is an illustration of typical POI scores in the system of Figures 4-6;

Figure 7b is further illustration of POI scores in the system of Figures 4-6;

Figure 7c is an illustration of an identified POI boundary in the system of Figures 4-6;

Figure 7d is an illustration of a refined POI boundary in the system of Figures 4-6;

Figure 8 is a flowchart illustrating the process of the system of Figures 4-6 in overview;

Figure 9 is a flow chart illustrating a method of generating and adjusting the layout data of Figure 3;

Figure 10 is a flow chart illustrating step 70 from Figure 9 in more detail;

Figure 11 is a block diagram illustrating a fuzzy logic system for determining a probability of which notional boundary contains the locations of a location set;

Figure 12 is an illustration of clustering and classification of spatial features to crowdsource anonymous store RF profiles;

Figure 13 is an illustration of the modular design of a store visit confirmation driven by event detection and a map-less solution;

Figure 14 is an illustration of identifying changing points based on WiFi dissimilarity on time series; and

Figure 15 is an illustration of Outdoor/lndoor detected events (particles) fused into entrance profiles based on position, bearing and WiFi signal.

Detailed Description of an Example Embodiment

In overview, the present embodiment provides a method for facilitating the identification of sites visited by a user device within an indoor region.

In overview, the method involves generating sub-region data representing a division of an indoor region into sub-regions in dependence on sensor readings within the indoor region. In a first phase, each sub-region is classified either as ‘in store’ (that is, within a particular site) or as a transition region (that is, a corridor, walkway, concourse, and so on). In a second phase, an identity is determined for each in store area determined in the first phase, for example using data from ‘check ins’ made by users in or near those sites. The data thus generated can then be used to locate users and then to produce store visiting confirmations for sites where the users are determined to be (optionally in combination with a confidence level associated with the determined identifier/site).

The following discussion relates to stores and shopping malls, but can equally be applied to other situations and types of sites (it can relate for example to housing units within a block of flats, municipal and non-commercial premises or sites, and other types of indoor or mixed indoor and outdoor areas, and so on).

Part of the process of dividing the indoor region into sub-regions makes use of (or is at least able to make use of) a novel system for automatically recognising the boundaries of where points of interest (POI) are located and generating geo-fencing information along with a confidence percentage (or other measure). Such information can then be made available to be consumed by any service provide who wishes to choose geo-fences with a confidence over a specific threshold depending on the application. This information can be used in particular in relation to the sub-region division/definition process mentioned above. The POI boundary detection system and a typical user device (as may be used in essentially all of the methods described herein) will now be described, before further discussion of the present embodiment mentioned above.

The presently described process attempts to learn as many POI details as possible with geo-fence (boundaries) from location data collected via a fully unsupervised positioning system, without the need to acquire geo-reference information of the operation territory from businesses or public sectors. However, some publicly available data, such as community maps and satellite images, can be utilised to improve the shape/quality of the geo-fences.

Figure 1 is a block diagram of a mobile telecommunications device 1 (such as a mobile smartphone, phablet, tablet, laptop, personal data assistant or wearable device such as a smartwatch) comprising a (typically hardware) computer processor 2 (which is typically a general purpose computer processor such as a microprocessor or microcontroller), a memory 4, an accelerometer 5, data communication antennas 6 (one or more or each of which may be directional antennas) and an orientation sensor (such as a gyroscope or magnetometer) 7. The data communication antennas 6 typically comprise a cellular telecommunications antenna, a Wi-Fi antenna and a Bluetooth antenna (not shown) configured to allow the mobile communications device 1 to communicate by cellular telecommunications, Wi-Fi and Bluetooth.

The mobile device 1 further comprises a location sensor 9. The location sensor 9 comprises a Global Navigation Satellite System (GNSS) antenna (e.g. Global Positioning System (GPS) antenna) configured to detect signals from GNSS satellites and a GNSS processor configured to process the signals received from the satellites to estimate the location of the mobile device 1 (alternatively processor 2 of the mobile device processes the signals from the GNSS satellites). The location sensor 9 may additionally or alternatively comprise a Wireless Positioning System (WPS) computer program application (typically comprising stored computer program instructions) configured to cause the processor 2 of the mobile device 1 (or an additional dedicated processor) to estimate a location of the mobile device 1 by processing electromagnetic signals detected by one or more of the antennas 6 from terrestrial electromagnetic signal sources of known (e.g. two dimensional) location (e.g. the two dimensional locations of electromagnetic signal sources may be stored in memory 4) and processing the received signals together with the known two dimensional locations of the electromagnetic signal sources to estimate the location of the mobile device 1, for example using received signal strengths together with the locations of the electromagnetic signal sources in a triangulation algorithm or with stored (e.g. in memory 4) fingerprint data specifying the expected variation of received signal strength from the said electromagnetic signal sources with distance from the signal source. The location sensor 9 may additionally or alternatively comprise a pedestrian dead reckoning application (typically comprising computer program instructions) configured to cause the processor 2 of the mobile device 1 to determine the location of the mobile device from accelerometer and orientation data from the accelerometer and orientation sensor 5, 7 respectively.

The mobile device 1 can communicate with a server computer 8 by way of a wireless telecommunications link 11 (which typically connects to a base station which propagates data to and from the server 8 across a telecommunications network) using one or more of the data communication antennas 6. The server computer 8 comprises a processor 10 (which is again typically a general purpose computer processor such as a microprocessor or microcontroller) and a memory 12, the processor 10 being configured to execute computer program instructions stored in the memory 12.

The mobile device 1 typically determines location data as it changes location, and sends the location data to the server 8 using one or more of its data communication antennas 6. The location data may comprise estimates of the location of the mobile device 1 (e.g. determined by the location sensor 9) or it may comprise positioning data from which locations of the mobile device 1 can be estimated (e.g. signal source data detected from a plurality of electromagnetic signal sources, or accelerometer and orientation sensor data from which a location of the device 1 can be estimated). In the latter case, the processor 10 of the server 8 may be configured to execute computer program instructions stored on server memory 12 to cause the server to estimate the said locations of the mobile devices from the positioning data. In either case, the server 8 can track the location of the mobile device 1. Heading data (typically measured by the orientation sensor 7 of the mobile device) is also typically transmitted by the mobile device 1 to the server 8.

Additional location data, relating to features of the location or activities carried out in the location, may also be provided.

A plurality of such devices 1 is provided which report location data to the server 8.

WO2016/066987, which is incorporated in full herein by reference, discloses a method of obtaining and updating a database of spatial features associated with a region. The method comprises: receiving positioning data that has been collected at a plurality of locations within the region; processing the collected positioning data to identify at least one candidate spatial feature associated with the region; identifying at least one other spatial feature corresponding to said at least one candidate spatial feature, said at least one other spatial feature and said at least one candidate spatial feature as a whole constituting matching spatial features; processing said matching spatial features; and updating the database of spatial features in dependence on the processing of said matching spatial features. This allows spatial features of an indoor region to be identified from unstructured, crowdsourced positioning data from multiple mobile devices moving in the indoor region. The identified spatial features may be areas (or spaces) within the indoor region (such as rooms or corridors), linear features (e.g. walls), gaps in features (e.g. portals such as doors), floor-change features such as elevators, escalators or staircases, turning points, ends of corridors and so on. Generally, the spatial features are (in some cases discrete) areas or spaces within the region in which people change location and/or which people enter and exit. The specific types of the spatial features identified may be stored in the database, together with the locations of the spatial features.

This process finds particular application in the identification of spatial features relating to the aforementioned transitional regions such as corridors, walkways, concourses in shopping malls, and so on.

Thus, the existence of spatial features, their types and their locations can be identified from unstructured location data crowdsourced from a plurality of mobile devices 1.

Figure 2 is an illustration of a basic geo-fencing scheme. This scheme assumes the existence of a geospatial database which hosts area-descriptive boundaries (geofences), typically in the form of arbitrary polygons, to describe a hierarchy of global Geoindex details, such as country, city, region, block, venue, and so on. An example of such a Geoindex is the Grid Spatial Index, described, for example, at the ‘wiki/Grid_(spatial_index)’ page of the ‘en.wikipedia.org’ website, the contents of which are incorporated by reference. Alternatively, and as will be used herein for the purpose of simplicity of presentation, a simple geo-referencing is also feasible, to allow anonymous area identification as GeoIndex in a global reference as per Figure

2. Other schemes are of course possible.

In Figure 2, the gridlines 202 forming a grid of individual locations are shown. An intersection 204 of the gridlines is shown, in this case labelled with longitude and latitude, but other addressing schemes (such as local coordinates rather than global) and coordinates are of course possible. A geo-fence 206 is shown, in this case a polygon which defines the extent of a point of interest. A grid space 208 is indicated, and shown with a corresponding GeoIndex. A subdivision 210 of another grid space is also indicated, with a corresponding GeoIndex. Depending on the scheme, further subdivisions may be possible, and typically these are essentially unlimited technically but may be restricted for purposes related to processing data according to the present scheme.

Figure 3 is an illustration of the use of elevation in a basic geo-fencing scheme. In this case, the base location 302 (assigned an elevation of 0) referenced by the GeoIndex is shown, as well as two other elevations 304, 306 with elevation numbers of 1 and 2 are shown. Typically the elevations refer to storeys but other schemes are possible. GeoIndices in this system are essentially two-dimensional, and elevations are necessary to capture POIs and similar on different floors of a building. In this case, an elevation array is added to each GeoIndex. This means that on the same GeoIndex there could, for example, be a 95% score for a restaurant with elevation 0 and a 90% score for a cinema with elevation 10 (say).

Figure 4 is a schematic of a system for training point of interest (POI) classification models. In essence, known POI data 402 is fed into a GeoIndex classifier/aggregator 408 in conjunction with parameters extracted from location data 404 by parameter extraction module 406. The aggregated GeoIndex and location data is stored as training data 410 which is classified by a POI classifier 412 and fed into training algorithms 414, and then stored in the POI propagation models database 416.

The location data includes location ‘tokens’ which are able to describe the movement of a smart device user on the basis of discrete, short sessions and long sessions. Each of the location tokens is typically expected (though not required) to provide at least a few parameters from the following list:

• Location, Elevation, Quality indicators, Travel Speed, Entering time, Exiting time, Activity recognition mode (stationary, walking, running, in car, and so on), Number of loops, Number of turns, Phone position, Illumination level, Surrounding noise level, at least one category likelihood (for example based on any data mining that could happen on the device), and so on.

The training phase illustrated in Figure 4 typically utilises all data available from external sources. This can include in-house engineer surveys, quality public information or third party maps data. Once a few areas have been identified that one can be confident enough to allocate POI data to, they are added to the training list. The training stage process is described in more detail below.

Figure 5 is a schematic of a system for using the classification models of Figure 4 to identify POI boundaries. This is known as the propagation phase.

The system processes unknown POI data 502, which is again fed into a GeoIndex classifier/aggregator 508 in conjunction with location data 504 which has parameters extracted via extractor 506. The aggregated data is processed by a model evaluation and scoring module 512 in conjunction with the POI propagation models 510 trained by the system of Figure 4, and the resulting scores (likelihood estimations) are stored in the POI GeoIndex scores database 514.

Figure 6 is a schematic of a system for validating the POI boundaries identified in Figure 5. This is known as the validation phase.

The POI GeoIndex scores 602, as computed in the system of Figure 5, are fed into a POI detector module 604, which then feeds into a line fitting module 606 which applies other sources of geospatial data 608 and then stores the adjusted boundaries (geo-fences) in the geo-fence database 610. The validation phase (which is optional) is described in more detail below.

Figure 7a is an illustration of typical POI scores in the system of Figures 4-6. Various classifications may have non-zero POI scores (even potentially higher scores) but in this case the POI scores for the ‘restaurant’ classification are shown. The central location has a sufficiently high POI score (95%) that it is typically considered to be a ‘ground truth’. These scores correspond to the data stored in the POI GeoIndex scores database 514 of Figure 5.

Figure 7b is further illustration of POI scores in the system of Figures 4-6. This illustrates visually the range of POI scores, with darker squares indicating higher certainty levels. The overall appearance of the shape of the restaurant POI can be made out approximately.

Figure 7c is an illustration of an identified POI boundary in the system of Figures 4-6. The boundary is determined on the basis of threshold confidence levels for the POI classification (in this case, a simple threshold of around 60% has been used, but other schemes are possible). This represents the output of the POI detector 604 of Figure 6.

Figure 7d is an illustration of a refined POI boundary in the system of Figures 4-6, corresponding to the output of the line fitting module 606, after the boundary is adjusted to conform to adjacent boundaries (not shown) and/or other constraints.

Figure 8 is a flowchart summarising the abovementioned three phases. In step S800, the POI recognition algorithms are trained. In step S802, POI recognition algorithms are applied to extract geo-fences. In step S804 (which is optional), the extracted geofences are validated. These three phases will now briefly be described.

In one process, a list is read of geo-fenced areas with known POI details. For each GeoIndex with known POI details, a similarity score is extracted from a lookup table for each possible classification, such as cafe, education, sports, and so on. Location data received in respect of each GeoIndex is aggregated, and each location data session or time frame is normalised. The normalisation typically includes extracting the useful data per each reported visit, such as speed, dwell time, number of turns, number of seconds in pedestrian mode/stationary mode/driving mode, and so on, and then assigning an initial weightage to each parameter (which defaults to a lookup table based on data quality indicators). After all GeoIndices have been processed then all data from all areas is grouped by classification. A training algorithm is then applied to each classification to generate POI scores. Once the output of the model is close enough to the list of similarity scores, the model is stored to be used in the propagating stage.

Once the training is done at least once, the focus moves to finding unknown POIs and geo-fencing them. The more training data available, the more accurate the POIs models in the database should become.

In a related process, areas in the geospatial database with data quantities above a threshold are located. For each area, the location data received for the area is aggregated per GeoIndex. Each location data session or time frame is normalised as per the equivalent step of the training phase. After this is completed, the data is run through each model (obtained from the training phase) to extract a score indicating similarity for the GeoIndex with each classification. An attempt is made to classify each GeoIndex to one or more models based on the highest similarity score. Polygons (or a spatial feature) are then generated for GeoIndices that fall under one (the same) model.

The process iterates through each classification, and runs the location data through each model corresponding to the relevant classification and returns an appropriate number of similarity scores. It will be appreciated that other processes can be used to determine a classification appropriate to a point of interest (POI). Other types of models can be used and/or other types of training may be carried out. The models may comprise (in part, or fully) neural nets, fuzzy logic or deterministic/weighted algorithms, for example, with an appropriate set of classification data (determining the performance/execution of the models) produced in each case.

In a further embodiment, the classification data comprises a look-up table, and a determination of the likely classification of a POI is performed by looking up values in the look-up table relating to one or more inputted properties. The properties may be combined to form a single look-up value or index into the table, or the table may be multi-dimensional, referenced by a plurality of properties (with interpolation or any other appropriate technique to allow the data set to be condensed into a practicable size).

In a subsequent process, each identified/generated geo-fence is considered in turn. Additional sources of geospatial data relevant to each geo-fence are determined. Each geo-fence and related geospatial data are processed with a line-fitting algorithm to produce a revised geo-fence (in some but not necessarily all or indeed any cases). Revised geo-fences (and indeed non-revised fences) are stored in the geo-fence database. In essence, a fitting (or fusion) of detected geo-fence is made to the nearest boundaries in the map, matching one or between the two sources, depending on confidence/weighting of the edge in each source. The outcome is then saved as the best estimation of the POI boundaries. Additional or alternative validation steps are possible.

Boundaries of points of interest (POI) determined by any of the above-mentioned methods can be refined, for example during or after the validation phase described above. This is an essentially optional process to obtain increased accuracy in the boundaries determined above, and also to provide a way to extend and enhance the boundaries in future, after the conclusion of the initial boundary processing. Location data discussed in this section is typically separate to the location data discussed elsewhere, and more specifically focussed on physical movement and physical location measurement than elsewhere.

Figure 9 is a flowchart of an optional method of generating and adjusting layout data representing the layout of a (typically) indoor region, typically performed by the processor 10 of the server 8 executing computer program instructions stored in server memory 12.

In a first step 60, spatial feature data relating to locations (and preferably also the types) of spatial features of the indoor region is obtained. Typically (but not necessarily), step 60 is performed by the server 8 receiving location data collected and transmitted thereto by mobile devices 1 at a plurality of locations within the indoor region and processing the location data to generate spatial feature data relating to spatial features within the building using the techniques described in WO2016/066987. It will be understood that determining spatial feature data from crowdsourced location data relating to changing locations of a plurality of mobile devices within the indoor region is not an essential feature of the present invention, and that a database of spatial features may already exist (which may have been previously generated for example using the methods of WO2016/066987 or by any other means) in which case it may be that step 60 may involve obtaining the said spatial feature data from a memory. In a next step 62 (or optionally as part of step 60), a principal path through the indoor region is determined. In a next step 64, for each spatial feature identified in step 60, notional boundaries (typically in the form of polygons) are generated, taking into account the locations of the respective spatial features from the spatial feature data (and, where provided, the types of spatial features - e.g. different spatial features may be provided with different notional boundaries of different shapes and/or sizes). Typically the notional boundaries are initially relatively small in size (e.g. a rectangle of 1m by 2m), typically smaller than the spatial features they represent, such that adjusting a notional boundary typically comprises expanding the said notional boundary.

In a next step 66, layout data representing the layout of the indoor region is generated from the determined principal path and the notional boundaries. In this exemplary embodiment, the layout data comprises a straight line representing the principal path, notional boundaries representing rooms adjacent to the principal path, and a further notional boundary representing a turning point. The boundaries extend from the principal path. The boundaries are of predefined size and shape selected in dependence on the types of spatial feature to which they relate. For example, the boundaries relating to some of the rooms may be five sided polygons (each having the same size and shape, albeit some may have different orientations), while the boundary relating to the turning point may be a six sided L-shaped polygon. Alternatively, boundaries of the same predetermined size and shape may be generated for each determined spatial feature regardless of type (or in the absence of type information being provided). The boundaries do not at this stage accurately conform to the shapes or sizes of the spatial features to which they relate.

In a next step 68, location data relating to changes in location of a plurality of mobile devices 1 within the indoor region is obtained. This location data may comprise or consist of the same location data used to identify the spatial features; alternatively, this location data may comprise or consist of location data obtained subsequently (or even in some cases prior) to the location data used to identify the said spatial features.

As mentioned above, the location data may provide a plurality of locations determined by the respective devices or data from which a plurality of locations of the said respective devices can be determined. In the latter case, step 68 also comprises determining the said locations of the respective devices.

Step 68 typically involves deriving respective location sets from the location data and outputting each location set in turn to step 70, steps 70 to 76 being repeated so as to process each of the location sets. Each location set typically comprises a set of locations (e.g. ten consecutive locations) of a single mobile device 1, but some location sets may comprise composite location sets made up of locations of a plurality of the said devices in combination. In some embodiments, some or all of the location sets relate to a spatial feature path followed by a device, or by a plurality of devices in combination, extending from outside a said notional boundary to inside the said notional boundary through a portal and subsequently from inside the said notional boundary to outside the said notional boundary through the or another said portal. In the case where all of the location sets relate to such a spatial feature path, step 68 comprises processing the location data to identify such spatial feature paths which are processed in turn by repeated executions of steps 70 to 76.

It will be assumed for the description of step 70 of Figure 9 below that the location set output by step 68 to step 70 is a set of locations relating to the spatial feature path of a mobile device X which extends from outside one notional boundary to inside the boundary through a portal and subsequently from inside the said notional boundary to outside again through the same portal onto a main thoroughfare.

Referring back to Figure 9, in step 70 the location set relating to the device X output by step 68 is processed to determine the most probable spatial features of the indoor region to which the locations of the location set relate, typically using a fuzzy logic system, such as a Fuzzy Interference System (FIS). Figure 10 illustrates the fuzzy logic algorithm executed in step 70 in more detail. In step 70a, locations of the device X provided in the location set are correlated with the principal path to determine a (for example 5 or 10 metre long) segment of the principal path which is closest to the locations provided of the location set. The inputs required for this step in the algorithm include: the location set from step 68; and data relating to the location, extent and direction of the principal path determined in step 62. A fuzzy rule is then implemented to determine a point on the principal path which is closest to the path formed together by the locations of the location set. A segment of the principal path of predetermined length (for example 5 or 10m) centred on that point is then output by the rule to step 70b. In the exemplary embodiment, the output of step 70a is a (e.g. 5m or 10m long) segment of the principal path roughly extending in a straight line along a thoroughfare between two points.

Referring back to Figure 10, in 70b, the fuzzy logic algorithm determines one or more notional boundaries near the identified segment of the principal path. The inputs required for this step are: data identifying the segment of the principal path identified in step 70a; and the layout data generated in step 66 relating to locations of the connections of the respective notional boundaries relative to the principal path. The fuzzy rule applied by the fuzzy logic system at this step determines which of the notional boundaries have a connection to the principal path which is less than a predetermined distance (for example 10 metres) from the segment of the principal path identified in step 70a. Firstly, the shortest distances of the respective connection points to the principal path of the notional boundaries to the segment of the principal path identified in step 70a are calculated. The rule then determines which of those distances are less than the said predetermined distance and identifies one or (typically) more of the said notional boundaries accordingly. In this exemplary embodiment, it will be assumed that notional boundaries are identified by step 70b.

In a next step 70c, the fuzzy logic algorithm estimates the probabilities that (functioning as relevance parameters indicative of whether) the locations of the location set are associated with the same spatial features as the respective notional boundaries identified in step 70b. The inputs required for this step are: the shortest distances of the connection points of the notional boundaries identified in step 70b to the segment of the principal path identified in step 70a; the directions of movement of the mobile device(s) along the spatial feature path comprising the locations of the location set; and the respective numbers (or proportions) of locations of the location set which are located within the respective notional boundaries identified in step 70b. The directions of movement of mobile devices 1 may be determined from heading orientation data received from the mobile devices 1, and/or from the locations of the location set. The number of locations of the location set within the respective boundaries can be determined by comparing the locations of the location set with the locations surrounded by the respective boundaries. The fuzzy logic system implements a fuzzy rule at this stage which calculates a probability value for each notional boundary identified in step 70b. The closer the connection point of a notional boundary to the principal path, the higher the number of locations from the location set within the notional boundary and the greater the correlation between the direction changes over the spatial feature path and the location of the notional boundary relative to the principal path, the higher the probability value calculated for the notional boundary (and the lower the probability value when the converse is true).

In the exemplary embodiment, a higher probability may be associated with one notional boundary than another because the first notional boundary has a connection point to the principal path closer to the segment of the principal path, because the direction changes of the device X in the location set output by step 68 correspond better to the location of the first notional boundary and because a higher number of locations from the location set are provided within that notional boundary than the other (which is provided with a low probability). Not all of the locations of the location set or direction changes of device X correspond to the first notional boundary, so the probability applied that notional boundary will be less than 100%. A low, but non-zero probability value may be applied to the other boundary on the basis of it being located relatively close to the identified segment of the principal path, but neither the direction data nor the location data correspond to the latter boundary.

It may be determined that the path followed by mobile device X moves through the end of a notional boundary. Responsive to that determination, in step 72, the notional boundaries identified in step 70b which have been allocated a probability value greater than a predetermined threshold in step 70c are adjusted. In this exemplary embodiment, only one notional boundary has a probability value greater than the threshold and so only that notional boundary is adjusted in step 72. The notional boundary is adjusted to become a larger notional boundary which encloses the locations of the device X from the location set in the area of the room outside of the original notional boundary. Although it may be that the boundary is expanded (e.g. by the minimum expansion necessary, or by discrete changes in one or more dimensions of the boundary) so that the expanded boundary contains all of the locations of the device X from the location set in the area of the room outside of the original notional boundary, it will be understood that it is not necessary for the expanded boundary to enclose all of the locations of the location set outside of the original notional boundary, but typically the expanded boundary will contain more of the locations of the location set than it did before being expanded. For example, the boundary may be expanded (e.g. by the minimum expansion necessary, or by discrete changes in one or more dimensions of the boundary) so that the adjusted boundary contains a number of locations of the location set provided outside of the notional boundary prior to adjustment determined in dependence on the probability calculated for the notional boundary in step 70c. Adjusting the notional boundaries may include adding new edges to existing polygons. This is useful for example to avoid excessively extending a long side of a polygon if new location data is only available along one side of the polygon (for example).

In a next step 74, the adjusted notional boundaries are compared to location specific geographical descriptive data (typically stored in memory 12 of the server 8) to determine whether the adjusted boundaries conflict with any layout restrictions comprised in the location specific geographical descriptive data. In one example, the location specific geographical descriptive data comprises building boundary data containing one or more layout restrictions relating to the boundaries of the building comprising the indoor region. This may include or be derived from external mapping data, satellite images of the building, location data relating to the location of roads surrounding the building and so on. The adjusted notional boundary is compared to the building boundary to determine whether any conflicts exist between them, for example using any suitable shape conflict detection algorithm or tool such as a genetic algorithm for polygon fitting (e.g. see “On genetic algorithms for the packing of polygons”, Stefan Jokobs, European Journal of Operational Research 88 (1996) 165-181 which is incorporated in full herein by reference), or a suitable topological tool provided by a geographical information system such as QGIS or ArcGIS. In a next step 76, if any conflict is found in step 74, the notional boundary is adjusted to remove the conflict. In this present case, no conflict is found. The method then moves back to step 68 which outputs the next location set to step 70. Steps 70-76 are then repeated for that location set. With each iteration, the shapes and sizes of notional boundaries are better refined until ultimately an accurate map of the indoor region is provided (typically after 100-200 location sets for each notional boundary). The iterations may for example take place periodically, or at predetermined times, or when a predetermined amount of path data has been aggregated (e.g. for a region).

When devices 1 (e.g. devices X, Y, O) are located within particular notional boundaries, they receive electromagnetic signals from terrestrial electromagnetic signal sources (which are themselves typically located within the indoor region), such as Wi-Fi access points, Bluetooth beacons and so on (not shown), using their data communications antennas 6. In this case, the device processors 2 are configured to execute computer program instructions which derive signal source data from electromagnetic signals they receive within the said notional boundaries and this signal source data is transmitted to the server 8. The signal source data is typically georeferenced to the location at which the signals from which it is derived were received. The signal source data may comprise received signal strengths and/or timing data (e.g. the times of flight of received signals) and/or angle of arrival data (e.g. the angles or directions of arrival of signals received by the said mobile devices) relating to electromagnetic signals received by the mobile devices within the said respective notional boundaries from respective electromagnetic signal sources. It may be that the signal source data comprises identifiers of electromagnetic signal sources detected by one or more said mobile devices within the said respective notional boundaries.

The server processor 10 typically executes computer program instructions stored on server memory 12 to derive respective signal source profiles for each of the notional boundaries from signal source data received from the said mobile devices 1 (e.g. devices X, Y, O) at locations within the said notional boundaries. Each said signal source profile is associated with a respective notional boundary. It may be that the signal source profile associated with each notional boundary comprises, for each of one or more electromagnetic signal sources detectable in that notional boundary, expected values (e.g. averages, e.g. accumulated means, of previously determined values by one or more said mobile devices) of one or more parameters relating to electromagnetic signals received from the said electromagnetic signal source by said mobile devices within the said notional boundary. For example, it may be that the signal source profile associated with each said notional boundary comprises, for each of one or more electromagnetic signal sources, expected signal strengths (e.g. averages, e.g. accumulated means, of previously determined signal strengths), expected timing data, such as times of flight, (e.g. averages, e.g. accumulated means, of previously determined times of flight) or expected angle of arrival data, such as angles or directions of arrival, (e.g. averages, e.g. accumulated means, of previously determined angles or directions of arrival) relating to electromagnetic signals received from the said electromagnetic signal source by said mobile devices within the said notional boundary.

It may be that the signal source profile associated with each said notional boundary comprises data relating to expected rates of change with respect to location of one or more parameters (e.g. received signal strengths, timing parameters, angle of arrival parameters) of electromagnetic signals received by one or more said mobile devices within the said notional boundary from one or more said electromagnetic signal sources.

Signal source data relating to electromagnetic signals received by devices 1 (e.g. devices X, Y, O) in the indoor region can be compared to determined signal profiles associated with notional boundaries in order to validate estimated locations of the said devices provided in or estimated from the said location data. For example if it is estimated that a said device is located in a said notional boundary, this can be validated or invalidated by comparing signal source data collected at that location to the signal source profile associated with that notional boundary. Additionally or alternatively, signal source data collected at a location inside or outside of a notional boundary can be compared to the signal source profiles associated with one or more notional boundaries relating to respective spatial features to determine whether that location is associated with the same spatial feature as one of the said notional boundaries (e.g. if the signal source data conforms to the signal profile). If it is determined that the said location is associated with the same spatial feature as a said notional boundary, that may be an indication that the said notional boundary should be adjusted to include (e.g. enclose or substantially enclose) the said location (in which case the method may comprise determining that the said location is associated with the same spatial feature as a notional boundary, and adjusting the said notional boundary to include (e.g. enclose or substantially enclose) the said location). Alternatively if it is determined that the said location is not associated with the same spatial feature as a notional boundary, that may be an indication that the said notional boundary should be adjusted to exclude the said location or at least should not be adjusted to include said location (in which case the method may comprise determining that the said location is not associated with the same spatial feature as a notional boundary, and responsively adjusting the said notional boundary to exclude the said location or not adjusting the said notional boundary to include said location).

For example, before notional boundary is amended in the method of Figure 9, the device X may move through the end of the notional boundary into an area of the room outside of the said notional boundary. By determining a similarity between signal source data detected by device X in the area of the room outside of the said notional boundary with a signal profile associated with the notional boundary, it can be better determined that the notional boundary should be amended to include the area outside of the said notional boundary. Conversely, if the signal source data detected by device X in that area was sufficiently different from the signal source profile associated with the notional boundary, it may be determined that the notional boundary should not be amended to include the said area.

In some embodiments, this additional check is implemented in the fuzzy logic system, typically by way of a second rule implemented by the fuzzy logic algorithm in step 70c. As illustrated in Figure 11 (which also illustrates the first rule performed in step 70c described above, together with its inputs), the second rule takes as inputs signal data derived from electromagnetic signals detected by the mobile devices 1 from terrestrial electromagnetic signal sources at the locations of the respective location set and signal profiles relating to the notional boundaries identified in step 70b. In the exemplary embodiment, the signal data includes signal data received from device X relating to electromagnetic signals it receives from terrestrial electromagnetic signal sources when it moves from point to point, and the signal profiles include signal profiles relating to various notional boundaries. The second rule, when applied in respect of a respective notional boundary, correlates the signal data with the signal profiles associated with the respective notional boundary and provides an output indicative of the probability that the signal data relates to the same spatial feature as the signal profiles associated with the notional boundaries (for example, the signal data relates to signals detected by the mobile device in the same room as that associated with the signal profile). The correlation may be performed by calculating a Euclidean distance between the signal data and the signal profile. For example, the Euclidean distance in each case may be the square root of the sum of the square of the differences between received signal strengths from particular electromagnetic signal sources (e.g. identified by MAC addresses) and expected received signal strengths from those electromagnetic signal sources as set out in the signal profile. The smaller the Euclidean distance between the signal data and a signal profile, the better the match between them.

The outputs of the first and second rules are then combined to provide an output probability or resemblance value which is indicative of the likelihood that the location data relates to the same spatial feature as the respective notional boundaries. When both rules are implemented, this is the output from step 70c (rather than the output from the first rule mentioned above).

Referring back to Figure 9, in step 68 the computer program code being executed by the server processor 10 may for example determine from similarity in the signal source data received from devices X and Y at the end and beginning of their paths through the indoor region respectively that the end of the path followed by device X and the beginning of the path followed by device Y were within the same room. This determination may comprise a direct comparison of signal source data relating to electromagnetic signals detected by the device X at its last detected location to signal source data relating to electromagnetic signals detected by device Y at its first detected location. More typically, signal source data relating to electromagnetic signals detected by the device X at its last detected location and signal source data relating to electromagnetic signals detected by the device Y at its first detected location may each be compared to the signal source profile associated with a notional boundary and it may be responsively determined that the signal source data from devices X and Y relate to the same notional boundary. The server 8 then responsively concatenates the paths followed by devices X and Y to provide a composite spatial feature path which, for example, may extend from the main thoroughfare, into the room through a doorway, turn through 180° and extend back out of the room into the main thoroughfare through the said doorway.

The locations of this composite path then become the location set processed by steps 70-76 as above in which a high probability is calculated for a notional boundary which is expanded to enclose the locations of the composite spatial feature path inside the room but which were previously outside of the notional boundary.

In one iteration of steps 68-76 of Figure 9, a location set comprising consecutive locations of the device Y from one point to another is output from step 68 and processed by steps 70-76. In this case, a boundary is adjusted in step 72 to become an expanded boundary which encloses locations of the location set which were outside of the boundary. However, in step 74 it may for example be determined that the upper edge of the adjusted boundary conflicts with the building boundary. Accordingly, the expanded boundary is adjusted in step 76 to overcome the conflict (by moving the upper edge of the boundary to be within the building boundary), thereby providing a further amended boundary. This can for example provide a more accurate boundary around a turning point.

As shown by the loop comprising steps 68-76 in Figure 9, over time further location data is obtained and processed to refine the sizes and shapes of the notional boundaries. Refined notional boundaries are compared to location specific geographical descriptive data describing boundaries of the built environment (typically building) comprising the indoor region and further refined responsive to that comparison. As discussed above, eventually the notional boundaries should converge to the shapes and sizes of the spatial features to which they relate, thereby providing accurate mapping data which can be used by mobile devices to estimate their locations within the indoor region.

When processing the location set relating to changes in location of a device O from one point to another in an iteration of steps 70a, similar probabilities may be allocated to two different notional boundaries and a conflict then arises between whether to expand one notional boundary to include the locations in a passageway or whether to expand the other boundary to include the locations in the passageway. When such a conflict is determined, an additional step may be performed in order to resolve the conflict. In this additional step, signal source data relating to electromagnetic signals received by the device O from one or more terrestrial electromagnetic signal sources when in the passageway is compared to signal source profiles associated with the notional boundaries as discussed above. If the signal data sufficiently (and better) matches the profile relating to the first boundary, that boundary is expanded to enclose the locations in passageway, or if the signal data sufficiently (and better) matches the profile relating to the latter boundary, that boundary is expanded to enclose locations in the passageway. However, responsive to a determination that the signal source data does not sufficiently match the signal source profiles of either notional boundary, the server processor 10 determines that the path through the passageway is not part of either notional boundary. Further responsive to a determination that the location of the device O changes from one notional boundary to the other through a region not within either notional boundary, the server processor 10 determines that the path through the passageway is in fact a transition path extending between notional boundaries (and not part of the spatial features, that is the rooms) to which the notional boundaries relate). Recognising that transition paths exist between notional boundaries helps to prevent the server processor 10 from continuously processing conflicting expansions of boundaries as a result of devices moving between them.

Returning now to points of interest and their identification:

Points of interest can be identified by comparing one or more locations of the indoor region (or locations of the notional boundaries) with one or more map sources (such as Openstreet maps or Google(RTM) maps) to identify points of interest associated with those locations. In another example, points of interest can be identified by obtaining one or more images (which may be created by one or more mobile devices and transmitted to the server 8 using one or more said data communications antennas 6) georeferenced to a location within the indoor region (or within a respective notional boundary); and processing said images to identify one or more points of interest within the indoor region (or determined to be within a respective said notional boundary). For example, optical character recognition can be used to identify text in images, said text being indicative of one or more points of interest (e.g. by being associated with a known brand or text indicative of a type of activity (e.g. gym, cinema) performed at a particular location within the indoor region. Additionally or alternatively, images may be compared to predetermined model images (e.g. stored in a database on the server 8 or on another server accessible to the server 8) to identify objects in said images indicative of one or more points of interest (e.g. gym equipment). LIDAR scans of the indoor region (or of an area within a respective notional boundary) may be processed in a similar way.

In another example, points of interest within the indoor region (or within notional boundaries) can be identified by obtaining social media data relating to an indoor (or outdoor) region or a notional boundary (e.g. submitted by a user from a mobile device 1 within the indoor region or notional boundary); and processing the said social media data to identify one or more points of interest within the indoor region (or within a respective said notional boundary). Said social media data may comprise any one or more of: text data; audio data; image data; video data (for example). Text, image or video data can be processed as set out above.

In another example, points of interest can be identified by obtaining audio data (which may be for example social media data (e.g. audio data uploaded to social media) or data recorded by the mobile device 1 and transmitted to the server 8) relating to the indoor region or notional boundary (e.g. detected by one or more mobile devices 1 in the indoor region or in a respective notional boundary); and processing the said audio data to identify one or more points of interest within the indoor region or notional boundary. Said audio data may be audio data georeferenced to a location within the indoor region or within a respective said notional boundary (e.g. by virtue of having been recorded at or streamed from a location within the indoor region or notional boundary). It may be that points of interest are determined by identifying one or more patterns in said audio data to thereby identify one or more points of interest within the indoor region or notional boundary, e.g. by comparing one or more patterns in said audio data to one or more models to thereby identify one or more points of interest within the indoor region or notional boundary. Alternatively, speech may be identified from audio data and processed to identify brands or categories of activity which may be performed in the indoor region or in respective notional boundaries. The audio data may be converted to text data, and the text data then processed as above to determine points of interest.

Other approaches to determining boundaries of points of interest are possible.

The system as it relates in particular to generating Store Visit Confirmation (SVC) data will now be described in more detail.

The aim of the project is to estimate the probability of mobile device being in a store which is part of complex indoor layout, such as shopping mall. A trajectory-based solution is provided as both location data and Wi-Fi signals detected by the mobile device along a track are examined to estimate “InStore” probability. The proposed solution can satisfy at least one of the following objectives:

• Developed for large scale smartphone applications with zero calibration and unsupervised deployment.

• Can fully utilize location history from sessions with or without Checkins.

• Capable of extracting spatial features and signal characteristics to prompt the session for SVC.

• Utilize series of events topologically links together along one track.

• Utilizes skeleton indoor map information, with limited or no input from maps providers.

• Identifies store visits even if store POI information is not available.

• For data classification and fusion, deep learning or machine learning can be applied to optimize weighting of spatial data based on area model.

Compared to the common SVC solutions, discussed above, the proposed solution can have some restrictions:

• Map boundaries for medium to small size stores are often inaccurate and frequently changing.

• Each target store may or may not contain the signal transmitters. Stores without any signal transmitters (WiFi Access Points) will degrade its percentage of SVC.

• Signals profiles (WiFi Scans) in specific store can be observed only for short period of time and are not guaranteed to be constantly reported throughout the session.

• Typically there is very limited and uncertain absolute positioning (GPS/ BLE/WiFi).

The proposed solution uses Simultaneous Localization and Mapping (SLAM) techniques to enable SVC features. Therefore, the solution can be divided into constructing “Stores Signal Profiles” database from qualified tracks with valid Checkins and the utilization of a trained database for visit confirmation of anonymous track and/or location data in one session, period or time. The overall proposed algorithms are based on the calculation of classification of track segments as either ‘corridor-passing’ or store-visiting “InStore”. Therefore, a new set of crowd-sourced SVC features are proposed.

As regards constructing the “Stores Signal Profiles” database, the following paragraphs describe how location history for confirmed visits to a store identified by the end user is utilised to populate a database of RF profiles describing signal characteristics in each store. While usual solutions focus only on the timeframe when the visit is confirmed, the present SLAM system analyses segments of a continuous track to assign a probability that the segment is within the confirmed store or section of corridor within the shopping centre. The track data come from crowd-sourced tracking, and segments that are calculated to be in the same position are fused to create multiple hypotheses of signal profiles for each store. These data items are stored in the Store RF Profiles Database which can then be updated as new tracks are analysed and used to improve the signal profiles within the database through a machine-learning process.

Some features the SLAM system utilises to calculate the signal profiles are:

• PDR (Pedestrian Dead Reckoning) weighting;

• Location information in the form of start point distance and bearing; and • RF Signal Profile per segment.

To integrate Store Visit Confirmation, in particular InStore classification, the following features have been added:

• Mean PDR-based speed (Feature 1);

• Mean WiFi-distance-changing-rate/WiFi-based-speed (Feature 2);

• Mean heading-changing-rate per PDR-travel-distance (Feature 3);

• Length of track-part in between detected turns (Feature 4); and • Corridor matching probability (Feature 5).

These features are used on the mobile-side to calculate the rough probability of a segment being within a corridor or in a store.

On the server-side, a sliding-window based clustering algorithm is used to carry out post-processing to create labelled tracks associated with either a corridor or store. Once enough tracks have been submitted, these can be used for segment clustering to identify a store temporal area assigned to an RF profile, as shown in Figure 12:

In the first stage (left hand side), tracks that end with common spatial features are classified for “STORE PROFILING”. In the second stage (middle), segments are then filtered based on bearing classifier, distance classifier and signal profile classifier. In the third stage (right hand side), multiple hypotheses of fused signal profile are created; the signal profile with highest weightage is allocated to the spatial temporal area.

The following paragraphs explain how SVC based on the crowd-sourced profiles is performed. The algorithms and features of this solution are examined in detail. The features of the track (described above) are analysed, along with trajectory and signal profiles to determine if the track is a valid candidate as a Store Visit. Once the database (“SSPDB”) has been trained and is considered mature enough, it can be used for SVC where no end-user Checkin is available to help identify the visited store. If any part of the track is considered to comprise a store visit, all possible stores, based on rough location, are examined to find which store should claim this visit. The high-level sequence of this approach can be summarized as:

1. Probability of a store visit;

2. Outlier detection through store feature verification; and

3. Generation of final SVC result.

The first two steps require mature SLAM branches through the area before it can provide the desired accuracy. Along with SLAM, various data science functionality including trained models for classification, are employed to drive the final SVC decision. Figure 13 illustrates the flow through various parts and modules.

The Signal Profiles Features Classifier will now be described.

The WiFi profiles collected by a user’s mobile phone change as they move around a venue. For example, the WiFi profiles at the entrance of a building are expected to be very different from those deep inside the building. When there is an abrupt change in environment, such as turning a corner or entering a store, the WiFi profile collected along a track can undergo significant changes in a very short period of time. Identifying these abrupt changes in the WiFi profile time-series is a change-point detection problem, which has been studied by the statistic and data-mining communities. We can make use of the algorithms devised for change-point detection to separate the track into different environments based on changes in the WiFi profiles. The results calculated here can then help to increase the confidence of store identification results obtained by SLAM methods.

Identifying change points involves devising a method of calculating the dissimilarity between data at the current timestamp with those that came before. In the case of WiFi profiles timeseries, we have a sequence in time of MAC addresses with their corresponding signal strengths. Although this is a multidimensional dataset, differences between two profiles can be evaluated using the signal strengths and whether particular MACs were picked up at one time but not at another. In this way we can construct a time-series of profile dissimilarity which can then be analysed to identify change points. WiFi profile dissimilarities have an intuitive interpretation: if the user is stationary or moving around in a confined region, such as in a shop, the dissimilarities are likely to be small over a period of time. Conversely, if the user is moving into or out of a very different environment, such as from the outside of a building to the inside, or entering a store from the corridor, we expect a significant change in the dissimilarity. Therefore, we can choose the position of a local maximum dissimilarity as the time where the change of environment occurs. Figure 14 illustrates such a dissimilarity time-series for a particular track, and the vertical lines indicate when our method considers a change in environment has taken place. The confidence of a detected change can then be estimated by the overall change in dissimilarity between it and the following detected change, which is shown in Figure

14.

The quality of the positioning from GPS and other Satellite systems is related to the line of sight to the satellite in the sky. Android’s OS provides an API to access the SNR (Signal to Noise Ratio) of each tracked satellite and its current position in the sky. This interface is commonly used to estimate how good the global positioning is, but it is also possible to use this information to measure when there is a good line of sight to the sky, among other characteristics that help to identify when the person is indoors or outdoors. To achieve this, the algorithm analyses the SNR of satellites in different region of the sky from constellation information, detecting sudden reductions or fast degrading slopes of the SNR per region.

These layouts are classified for pre-determined models (such as full cover ceiling, sky light tunnel or glass wall). Analyzing the occurrence of these models according to the received satellite power, we are estimating the probability of being outdoor (or in a corridor with a clear sky view) or indoor (indoor corridor or store).

As a part of the mobile-side algorithm, the outdoor/indoor (Ol) detection function is applied to calculate outdoor/indoor probability for each segment. These probabilities are then used to detect entrance/exit events using a particle filter associated with position, bearing and signal profile. These particles are then reported to the SLAM server-side to be fused together. Once a branch/segment aggregates enough particles it can be classified as an entrance feature, as shown in Figure 15.

The SLAM mobile-side algorithm can collect, process and qualify PDR-based tracks while the server-side algorithm can merge uploaded tracks incrementally and update the crowd-sourced database automatically with more qualified track inputs. The SLAM database consists of merged route branches/segments attached with fused PDR/WiFi profiles and fusion weightage.

Generally, compared to in-store branches/segments, the corridor branches are longer and segments get higher fused weightage, due to the nature of PDR propagation and the SLAM fusion strategy for overlapped track-segments. Making use of these two features, the InStore probabilities of merged branches/segments can be calculated.

The branches/segments in different stores can be classified with unique fused PDR properties (bearing and position) and WiFi profiles. Making use of the trained SLAM database, the InStore probability of each part of track of the visit confirmation request can be obtained by matching the WiFi profiles with the branches/segments in the SLAM database.

InStore Feature Verification will now be described.

The characteristics of the walk pattern usually change in different environments. When a person is walking in a corridor, they usually have a fast walk, tend to look in the same direction and walk straight for long segments before making a turn. On the other hand, when a person is moving around a store, the pattern of walk changes. In our analysis we observed a slower velocity, the orientation of the phone changes more (wandering movement) and the straight sections of the path are shorter.

To determine when a person has walking characteristics that can be associated with being in a store, we extract those features from the Pedestrian Dead Reckoning (PDR) positioning module and the WiFi signals observed. The features to extract are:

• PDR walking velocity: The velocity of a person can be estimated by analyzing the signals of the inertial measurement unit in the phone. A common technique used is to detect the occurrence of a step observing the minima and maxima of the norm of the acceleration and turn rates. The length of the step can be estimated from the characteristics of the signals, and therefore the velocity is estimated as:

velp_DR = step_freq stepjength • WiFi walking velocity: when a person is stationary, the RSS tends to remain almost constant, but if the person walks those signals will change as fast as the person walks. For a given MAC, the relationship between the change in the RSS and the velocity can be given by:

velWiFi cos(Q) = (ARSS-d₀) (ln(1O) / ΙΟβ-At) = v_e

The value will change according to the distance to the Wireless Access Point (WAP) estimated as:

d₀ = 10 (^RSS-^RSS0)^/1°P and 0 is the angle between the direction of movement and the direction of the WAP. For a single MAC it is not possible to correctly estimate the velocity unless we know the relative position of the WAP and the person, but if enough WAPs are observed, we could assume they are uniformly distributed around the user and the velocity can be obtained from the maximum of the absolute values of a set of ν_θvalues. Additionally, if we measure the RMS of the ν_θ values (and θ is uniformly distributed), it should be equal to velwin/ V2.

• The rate of change of the heading: For a straight segment composed of several PDR displacements, the rate of change is measured as the root mean square value of the changes in the movement directions of the displacements (Δφ,) weighted according to the length of the displacement (DLj). This value is estimated from the observed changes in the phone heading as:

Aheading = Σ DL, (Aqtf ) / (ZDL,) • The proportion of the track: The ratio can be obtained dividing the length of the segment (SC) by the total length of the straight track that contains it: r = SLi/ZSLi

Using these features, it is possible to classify when a person enters a store. In particular, logarithmic regression classification is employed to obtain the probability for the movement features to be in a store. Other classifiers could be used if needed or appropriate.

With regard to store identification: RF profiles that characterise the signal space inside stores can be built using crowd-sourced WiFi data that are labelled with the store IDs. These profiles, which consist of MAC addresses with their corresponding RSS values in general become more reliable or ‘mature’ as more data is added, and after some time can be used to identify the stores in which unknown WiFi scans have been taken.

The algorithm presented in this section matches unknown WiFi scans to stores by comparing their profiles and selects the store with the highest similarity. In this case, the similarity measure used is the weighted Jaccard index, which is defined as the following:

Weighted Jaccard (S, T) = Σ, min(S_h T,)/Σ, max(Sj, T,) where the values S, and T, are defined as:

S, = [MAC_si:RSS_si, MAC_S2:RSS_S2, ... MAC_sn:RSS_sn]

Tj = [MACn:RSSti, MACt2:RSSt2, - MACtN^RSStN] to take into account MACs that appear only in one set but not the other.

This is an enrichment of the standard form of Jaccard index because it is not only based on MAC addresses, but also considers RSSIs. In previous steps, scans are pre-filtered before going into processing. Furthermore, the list of MACs in profile or scan are sorted by RSSI, strongest first, to reduce the impact of weak signal on the estimated distance. A form of normalization to place all measured RSSIs above zero is performed. In the event that more than one scan is extracted from the same track/segment to be classified, they are then fused into one profile before the distance calculation. As a result of fusion, mean RSSI per MAC is used in the Weighted Jaccard Distance.

To examine the effectiveness of Weighted Jaccard Distance an experiment was carried out in a local shopping mall. We selected a set of stores which are small to medium in size and close enough to each other. As expected, we verified that an overlap in WiFi fingerprints is commonly observed between stores. In this experiment we utilized 90% of the collected scans, randomly selected, to construct the store WiFi profiles and the remaining 10% for testing. Two stores (0, 1) shared common MAC addresses with similar RSSIs. However, the overall results confirmed close to 100% classification.

The last component of SVC is to identify the names of the particular stores the users have visited. Crowd-sourced WiFi signals in a number of stores to were used to build labelled “store profiles”. While there may be significant differences between some of the store WiFi profiles, e.g. store A lies far away from the others in signal space, mirroring its location in a different part of the shopping mall, stores B and C may lie directly opposite to each other across the corridor, thus have a large number of MAC addresses in common. When there are insufficient number of WiFi profiles to create a store profile, the confidence of the profile being truly representative of the store is low, and this can be a severe limitation in the accurately assigning a store to a track. The improvements proposed below can help to reduce the problem.

The procedure for matching tracks and SLAM recorded segments into a known digitised road network or crowd-sourced heat map is well known. However, these assume the route is reasonably accurate. Unfortunately, within buildings, especially large ones such as shopping centres, the possibility of routes and the noise in input tracks make it very difficult to trace the track to a specific store. In addition, although it is often possible to digitise shopping centres (either based on online maps provided by the centres website, or maps that have already been digitised by indoor mapping companies), these maps regularly change within the designated space, and keeping its POI up to date for global coverage is not a viable solution. It is, however, still possible to do some form of map matching between the coordinates reported by the mobile device and centrelines created from the digitised maps of shopping centres. This can provide some assistance in determining where the device’s route is likely to end within the large space of unmarked plot. Therefore, only common routes, centre lines, are utilised in this proposal.

This form of map matching could be utilised to classify segments’ positions with respective accuracy into candidate stores. With the assumption that routes matched to the same centre lines sections would end up in the same store, an initial map based classification can be obtained. The signal profiles stored per segment passing beyond the centre lines are then clustered based on similarity and topology classification. The general scheme of the algorithm is as follows:

In respect of the input track (location history), for each segment, a store is identified from the map position. Then for each store, profiles are identified inside the store. Lastly, for each store, profiles from the surroundings are evaluated. The store profiles are then created.

Additionally, the WiFi data is used such that for each store, the WiFi distance to the profile is determined. The closest store profile is then identified as a possible store.

These centrelines can be automatically generated using functions available in common spatial systems. For example, the st_approximatemedialaxis1 function creates a line which follows the centre, medial axis, of a polygon, and this can be used on a polygon which defines the corridor to create a centreline network that the customer’s track can then be mapped to.

As discussed above, SLAM can be utilised to classify InStore segments and corridor segments. The discussed solution then relies on constructing signal profiles from segments tagged as InStore only. However, as it is now common for unsupervised systems to ask users to verify an identified store, signal profiles will be more accurate and can act as destination check points. Nevertheless, the current SLAM solution has more capability than just segment fusion. The same system could be used to match a full track, set of segments, providing more information about the profiles between stores. Therefore, this new concept aims to build the topology of how InStore segments between multiple stores link together.

To enable the topology crowd-sourcing, each segment should hold information about segments frequently linked to it, based on the submitted tracks. These set of information define how to navigate from one store to another. On the other hand, when we are calculating the probability of a person being in this particular store we will track the probability of each segment, according to the observed WiFi data, to propagate over all segments the user could have passed through to reach that destination. Once the probability of a given topology passes a certain threshold, it could be marked as a store visit.

The probability between segments according to the PDR direction is propagated, and then the probabilities are updated according to the similarity measure of signal profiles.

p(s_i:k) = Σ p(move from Sj to s,) p (s_J:k.i, WiFi) p(s_i:k, WiFi) = p(s,,_k) p(WiFi I WiFij)

The SLAM system described above can sequentially process, qualify, cluster and merge the collected tracks from anonymous users. This information incrementally updates a crowd-sourced database, which consists of merged PDR/WiFi profiles in location based route branches/segments, and it doesn’t rely on any indoor maps.

According multi-venue testing results, the generated database can be reliable with sufficient track inputs (depending on the dimension of the indoor environment, generally about 200 qualified tracks per ordinary shopping mall) and can be selfimproved dynamically and adaptively with increased inputs. Based on the database and system design, many location based services can be utilized, such as SVC.

SLAM provides capabilities for SVC for both detection algorithm and crowd-sourced profiles. The outdoor-indoor events can be detected with high reliability using the GPS based outdoor-indoor detection in the mobile-side and the crowd-sourced entrances within the serverside database. This can be used to detect the outdoor cases and give an outlier rejection for SVC. However, the main contribution of SLAM appears in the features of branch/segment profiles. In addition, in-store segments and along-corridor ones can be classified and in-store probabilities can be calculated with a reliable performance. For SVC, the WiFi profile from the user’s request is compared to the WiFi profiles of the classified segments in the database, the matched segments will give its in-store probability to the request. Once the requests continue along a consecutive pedestrian track, the algorithm will match them with the sliding-window based clustering method, which can provide a more reliable SVC result. If the track of requests is qualified by the SLAM mobile-side, it will be submitted to the server-side algorithm to update the SLAM database.

On the other hand, based on the features of pedestrian-motion and GPS/WiFi-signalchanging extracted from mobile sensors, the general in-store probability can be estimated via logarithmic regression classification. This estimated probability is combined with the in-store probability value via matching with the SLAM database to get a final in-store classification result.

To empower SVC, all WiFi scans assigned to in-store classified track segments are compared with store RF profiles. Along with SLAM branches and profiles labeled WiFi scans collected when an end user identifies a store, for example when making a checkin, is fully utilized in RF profiles crowdsourcing. Machine learning methods can be used to improve both the quality of the profiles and the matching of a WiFi scan to stores. With an increased amount of data, the accuracy of store profiles can be improved.

With the SLAM crowd-sourcing algorithm, more features and landmarks for SVC (and other potential applications) can be extracted from multi-sensor data, processed along with the SLAM tracks and merged within the SLAM database. Finally, the support from the map information, map matching and topology crowd-sourcing can further improve instore probability calculation and store identification.

Further modifications and variations may be made within the scope of the invention herein disclosed.

Returning to the POI boundary generation process, although the notional boundaries shown in the above examples are polygons with straight edges, it will be understood that the notional boundaries may additionally or alternatively be provided with one or more curved edges.

The steps performed by a single computer or (for example) server 8 may alternatively be performed by a plurality of servers in combination and/or by one or more mobile devices (e.g. one or more mobile devices 1).

Further examples of using a mobile device to determine an activity carried out in a location, whether by the user of the mobile device or otherwise, can be considered. For example, if the mobile device is stationary for long periods, POIs such as churches, cinemas and restaurants may be considered. The type of noise and light levels may then be used to discriminate between those options, for example. Cashless purchases used by the phone can in many cases give useful clues as to activities (for example if a church donation was made). The reception or otherwise of GPS signals can indicate whether or not the user is inside or outside a building, for example. Magnetic readings from a magnetometer in a user device can determine proximity to a car, for example. Other variations are of course possible without departing from the spirit and scope of the invention.

Claims

1. A method of facilitating the identification of sites visited by a user device within an indoor region, the indoor region including a plurality of sites and at least one transition region, each transition region connecting two or more said sites, and the method comprising:

generating sub-region data representing a division of the indoor region into a plurality of sub-regions in dependence on aggregate sensor data received from sensors within the indoor region;

classifying each sub-region as a site or a transition region; and generating identification data representing the identity of sites corresponding to sub-regions that have been classified as sites; and outputting the sub-region data and identification data.

2. A method according to Claim 1, wherein said sub-region data further comprises fingerprint data to facilitate identification of each sub-region in dependence on sensor data received within said sub-region.

3. A method according to Claim 2, further comprising generating the fingerprint data for each sub-region in dependence on sensor data received within the respective sub-region.

4. A method according to Claim 3, wherein the fingerprint data is derived from sensor data including at least one of signal source identifiers, such as MAC addresses and uuids, received signal strength, magnetic field strength and direction, visual references, and audio references.

5. A method according to any preceding claim, further comprising receiving an identity of a site, determining a sub-region associated with the side, and adding the identity and associated sub-region to the identification data.

6. A method according to Claim 5, wherein the identity is inputted by a user who has visited the site.

7. A method according to Claim 6, wherein the sub-region associated with the identity is determined with reference to sensor data received by a user device associated with the user inputting the identity.

8. A method according to any preceding claim, further comprising identifying boundaries between sub-regions in dependence on discontinuities in the aggregate sensor data corresponding to transitions between transition regions and sites.

9. A method according to any preceding claim, wherein the sites are shops and the indoor region is a shopping centre.

10. A method according to any preceding claim, wherein at least one site includes a plurality of rooms.

11. A method according to any preceding claim, wherein the aggregate sensor data includes data from at least one of: a movement sensor, electromagnetic signal sensor, acoustic signal sensor and magnetic field sensor, and may in particular include data from at least one of: an accelerometer, pedometer dead reckoning system, global positioning system, Wi-Fi positioning system, camera, microphone, magnetometer, and radio frequency receiver or transceiver.

12. A method according to any preceding claim, wherein classifying each subregion as a site or a transition region comprises processing aggregate sensor data to determine at least one common user behaviour in the sub-region, and classifying the sub-region in dependence on the user behaviour.

13. A method according to any preceding claim, further comprising receiving path data representing a path taken by a user device through the indoor region, and wherein the sub-region data is generated in dependence on the path data.

14. A method according to Claim 13, wherein the path data represents a plurality of paths taken by at least one user device, and the method further comprises combining the paths to determine an aggregate path.

15. A method according to Claim 14, wherein at least one sub-region relating to a transition region is defined in dependence on at least a portion of the aggregate path.

16. A method according to any preceding claim, further comprising receiving additional sensor data, and updating the sub-region data in dependence on the additional sensor data.

17. A method according to any preceding claim, wherein at least one of the steps of generating sub-region data in respect of at least one sub-region, classifying at least one sub-region, and generating identification data for at least one sub-region is carried out on a user device.

18. A method according to any preceding claim, wherein at least one of the steps of generating sub-region data, classifying the sub-regions, and generating the identification data is carried out on a server in communication with a plurality of user devices.

19. A method according to Claim 18, further comprising transmitting sensor data from a user device to the server.

20. A method of identifying sites visited by a user device within an indoor region, the indoor region including a plurality of sites and at least one transition region, each transition region connecting two or more said sites, and the method comprising:

receiving user sensor data from sensors in the user device;

accessing sub-region data representing a division of the indoor region into a plurality of sub-regions in dependence on aggregate sensor data received from sensors within the indoor region, each sub-region being classified as a site or a transition region;

processing the user sensor data and the sub-region data to determine at least one candidate sub-region within which the user device may be located; and accessing identification data, representing the identity of sites corresponding to sub-regions that have been classified as sites, to determine at least one candidate identity of the site associated with said at least one candidate sub-region; and outputting said at least one candidate identity.

21. A method according to Claim 20, further comprising determining said at least one candidate sub-region in dependence on a trajectory followed by the user device.

22. A method according to Claim 20 or 21, further comprising determining said at least one candidate sub-region by matching the user sensor data with fingerprint data associated with the sub-region.

23. A method according to any one of Claims 20 to 22, further comprising determining said at least one candidate sub-region in dependence on at least one of the bearing or orientation of the user device, the speed of the user device, the rate of change of the bearing, orientation or speed of the device, the distance travelled by the user device, the acceleration of the user device, and pattern of movement of the device.

24. A method according to any one of Claims 20 to 23, further comprising outputting a probability that the user device is within a said candidate sub-region.

25. A method according to any one of Claims 20 to 24, further comprising outputting a location history of the user device prior to entering said at least one candidate sub-region.

26. A method according to any one of Claims 20 to 25, wherein the sensor data is collected over a period of time.

27. A method according to any one of Claims 20 to 26, wherein the sensor data is collected while the user device follows a path through a plurality of locations.

28. A method according to Claim 27, further comprising processing the user sensor data to detect transition points in the path, such as a transition between a transition region and a site.

29. A method according to any one of Claims 20 to 28, further comprising processing said at least one candidate identity.

30. A method of identifying sites visited by a user device within an indoor region, the indoor region including a plurality of sites and at least one transition region, each transition region connecting two or more said sites, and the method comprising:

receiving user sensor data from sensors in the user device;

processing the user sensor data and the sub-region data to determine at least one candidate sub-region within which the user device is located; and outputting said at least one candidate sub-region.

31. A non-transitory computer readable medium tangibly embodying computer program code which, when executed by one or more computer processors, causes the computer to carry out a method as claimed in any one of Claims 1 to 30.

32. A computer processing system comprising one or more computer processors and associated memory, the memory tangibly embodying computer program code which, when executed causes the computer to carry out a method as claimed in any one of Claims 1 to 30.