WO2013186552A2

WO2013186552A2 - Aggregated mobility profiling

Info

Publication number: WO2013186552A2
Application number: PCT/GB2013/051534
Authority: WO
Inventors: Abdelmalik Bachir; Kin Leung
Original assignee: Imperial Innovations Limited
Priority date: 2012-06-14
Filing date: 2013-06-11
Publication date: 2013-12-19
Also published as: WO2013186552A3; GB201210549D0

Abstract

A system for collecting and characterising mobility data of a plurality of mobile wireless communications devices is disclosed, the system comprising: at least a first and a second detector having a respective first and second detection range and each arranged to detect entry and departure of a mobile wireless communications device into the detection range and store associated mobility data; and an aggregator arranged to aggregate mobility data received from the detectors.

Description

Aggregated Mobility Profiling

Field

The invention relates to using devices to record, analyse and summarize mobility data, and particularly relates to the use of mobile phones (or Wifi equipped devices) and small cells (or Wifi access points) to collect and aggregate the mobility data, and the representation of aggregated mobility profiles for groups of customers.

Background

The automated collection and processing of data regarding individual behaviour presents significant technical challenges of accuracy, ease of analysis and data security. For example, understanding customer shopping behaviours is one of the used tools required by business owners to evolve and sell their products in a way that improves customer satisfaction and increases profit. In the past decades, researchers have been focusing on following customers, recording their behaviours, establishing models to classify them into groups and trying to figure out the motives for their purchase actions.

Among many parameters defining the behaviours of customers, mobility is the one that has attracted extensive research efforts. Study of customer mobility requires the accomplishment of two operations: (i) collecting mobility data, and (ii) analyzing the collected data to establish models that help business owners make appropriate decisions.

Collecting mobility data is the first operation that needs to be carried out. The collected data should be statistically significant. Its volume needs to be large and it should represent all categories of customers. Therefore, the technology used for this operation is crucially important. Techniques used for mobility data collection in previous research include: manual following and tracking of customers movement, attachment of RFIDs or cameras to shopping trolleys, WiFi-equipped wristbands, etc. With these technologies, the collected mobility data may be limited in size, inaccurate, biased, low resolution, invasive of individual privacy or may not represent the mobility profile of a wide set of all possible customers.

Processing and analysing the collected mobility data to adequately characterize the aggregated mobility profile of a large number of customers is the second operation that needs to be performed. Most of the existing research in this direction has focused on using statistical techniques to extract the major mobility patterns and classifying customers accordingly. This can provide over-generalized results; conversely where customers with similar mobility patterns are classified in the same category, different mobility models have been established to represent each category. The representation of mobility has often been restricted to the path taken by customers during their shopping journeys. Although this provides some information about customer mobility, it can be inaccurate and the analysis available may be limited, and many techniques rely on trial and error to develop models.

Another important factor for consideration is privacy and anonymity of customers reported in the collected mobility data. It is particularly important because the collected data shows the time, locations and the actual movement of the associated customer from one location to another. Typically, customers may not prefer their mobility to be tracked for this reason.

Additionally, the technologies in use are often unable to collect mobility data from a wide variety of customers without disturbing their original behaviours, and require significant amounts of computation and mobility data to be saved and stored for subsequent processing when the established mobility models are updated or revised upon availability of new data. Again, storing such data can contribute to the breaching of customer privacy and anonymity, and incur excessive costs and risks related to the data hoarding. The invention is set out in the claims.

Embodiments of the invention will now be described. According to an embodiment, a system for collecting and characterising mobility data of a plurality of mobile wireless communications devices is disclosed. The system comprises at least a first and a second detector having a respective first and second detection range. Each of the detectors is arranged to detect entry and departure of a mobile wireless

communications device into the detection range and store associated mobility data. The system further comprises an aggregator arranged to aggregate mobility data received from the detectors.

According to another embodiment, a system for collecting and characterising mobility data of at least one mobile wireless communications device is disclosed. The system comprises at least a first and a second detector having a respective first and second detection range. Each of the detectors is arranged to detect entry and departure of a mobile wireless communications device into the detection range and store associated mobility data. The system further comprises a processor arranged to receive mobility data from the detectors and correlate the mobility data obtained in at least two detection ranges. Figure Listing

Figure 1 shows a diagram of an exemplary system for collecting mobility data.

Figure 2 is a flow chart of an example interaction between a mobile wireless communications device and a base station of figure 1. Figure 3 shows a diagram of an exemplary system for collecting mobility data comprising more than one base station.

Figure 4 is a flow chart of an example interaction between a mobile wireless communications device and the base stations of figure 3.

Figure 5 is a flow chart of one embodiment for analysing mobility data.

Figure 6 shows a diagram of an exemplary network for collecting and analysing mobility data records.

Figure 7a shows an example tracking area where small cells are deployed.

Figure 7b represents an example spatial-mobility graph of customers.

Figure 7c represents a matrix representation of the graph of Figure 7b.

Detailed Description

In overview, the approach provides an efficient representation of both spatial and temporal mobility patterns. The spatial representation of mobility is based on graph models where each node in the graph represents a cell (which is adequately covered and served by a "base station" such as a femtocell or WiFi access point) of the given service area and a directed edge connecting two nodes in the graph indicates the possible movement of a customer from one of the nodes to another. The weight of an edge in the graph corresponds to the probability for an arbitrary customer moving from one cell to another. In a further development, the mobility graph is augmented by three sets of parameters, two of them characterize temporal mobility, namely the distribution of the amount of time a customer spends in each cell and the correlations between these times, and the third parameter characterizes the frequency of visits to each cell. Further still, aggregation of the data allows customer behaviour analysis without the need to store potentially sensitive underlying data.

As customers move through various cells of the service area, mobility data is collected and processed continuously. Specifically, the spatial and temporal representation of the mobility patterns are constructed and updated by an iterative process using the collected data. Once the graph model parameters are updated by processing new mobility data, the latter can be discarded permanently. Since mobility data can be deleted immediately after processing and since the model parameters only show aggregated profiling information, the proposed method ensures privacy and anonymity of individual customers and their mobility patterns. Furthermore, disposing of the collected data has another important advantage of reducing cost and risk related to excessive data hoarding.

To overcome the shortcomings of existing techniques, the invention provides a system combining a suitable technology to collect mobility data and algorithms to process and represent a mobility profile efficiently. In one embodiment the approach is developed for service areas deployed with small cells, such as femtocells or WiFi access points, to enable efficient collection of mobility data from a large set of potential customers. Appropriate networks will be well known to the skilled reader such that a detailed description is not required here. For example, in known networks such as those described at www.smallcellforum.org, incorporated herein by reference, femtocells are deployed at fixed positions throughout a service area (e.g., a large shopping mall). Due to the relatively short communication range of femtocells, the approximate location of a given customer can be determined when the customer's mobile phone can access a particular femtocell. That is, the detection of the mobile phone by the femtocell reveals the location of the associated customer. As a result, the corresponding femtocell can record the time when the customer enters its communication range (which enables the presence detection of the mobile phone and thus starts of the association) and when he/she leaves. Such timing, mobile phone and femtocell (or location) identity information from the mobility data that can be processed by the proposed techniques to yield the aggregated mobility profile for a large group of customers. It should be clear that the collected mobility data, although representing approximate customer locations, is sufficient for a range of applications (e.g., characterization of shopper movement patterns) where location accuracy is of the order of the communication range of the small cell technology in use.

Another example of small-cell networks is the deployment of WiFi access points throughout a service or tracking area where the communication range of the access points is on the order to a few ten of meters, comparable to that of the femtocells.

Collecting Mobility Data

Turning to one embodiment of the detailed implementation, the first step is to collect the mobility data. Figure 1 shows an example system for collecting mobility data. The example mobility data collection system comprises a base station 102 having a detection range 104, a mobile wireless communications device 106, a detection zone 108 and a non-detection zone 110. The base station 102 is a wireless communications device typically installed at a fixed location. The base station 102 has detection zone 108 extending from the base station 102 to a detection range 104 of the base station 102. Signals can be received by the base station 102 from devices that emit signals inside the detection zone 108. An example of one such device is the mobile wireless communications device 106. In figure 1, the mobile wireless communications device 106 is in the non-detection zone 110 which lies outside the detection zone 108. Therefore, in figure 1 signals from the mobile wireless communications device 106 cannot be detected by the base station 102. The signal emitted by the mobile wireless communications device 106 and detected by the base station 102 includes an identifier. The identifier serves to identify one mobile wireless communications device 106 from another, and in one embodiment may be an International Mobile Subscriber Identity (IMSI). The IMSI is stored inside the mobile wireless communications device 106. Although an IMSI is specifically mentioned, the skilled person would recognise that other identifiers could be used to distinguish one mobile wireless communications device 106 from another.

As discussed above, in one embodiment a suitable type of base station 102 is a femtocell. Femtocells have a detection range 104 that is greatly reduced compared to many other types of base station 102, and can have detection ranges 104 as small as a few metres. Typically, femtocells have the capability of detecting: (a) the presence of mobile wireless communications devices 106 when they come within the detection range 104, (b) activities performed by mobile wireless communications devices 106 while they are within the detection range 104 (i.e., inside the detection zone 108) such as telephone and data calls, and (c) departures of the mobile wireless communications devices 106 from a detection zone 108 to a non-detection zone 110. An example of a typical femtocell used by Alcatel-Lucent is available at the internet page addressed as www. alcatel-lucent, com.

Figure 2 shows the interaction between a mobile wireless communications device 106 and a base station 102. The mobile wireless communications device 106 begins at step 200, where it is outside of the detection range 104 of the base station 102 and is therefore in the non-detection zone 110. At this point, signals emitted by the mobile wireless communications device 106 are not received by the base station 102.

At step 202, the mobile wireless communications device 106 enters the detection range 104 of the base station 102 and is therefore now in the detection zone 108 of the base station 102. At this point, signals emitted by the mobile wireless communications device 106 are received by the base station 102.

At step 204 the base station 102 "associates" with the mobile wireless communications device 106 and records the following information: "nodelD " the identity of the base station 102, "customerlD " the identifier of the mobile wireless communications device 106, and the "entryTime " as the time when the base station 102 received a first signal from the mobile wireless communications device 106.

Using this information, a weighted, directed graph can be developed. Algorithm 1 below details the algorithm corresponding to the mobile wireless communications device 106 "associating" with the base station 102:

Algorithm 1 : When a mobile wireless communications device "associates" with a base station j, create a mobility data record R as follows:

R.customerlD =

R.nodelD = j

R.entryTime = current time

R.departureTime = infinity The R.field found in the algorithm above corresponds to a data field of the data record R (i.e., customer ID, node ID, entry time and departure time).

At step 206, the mobile wireless communications device 106 moves outside of the detection range 104, and therefore moves from the detection zone 108 to the non- detection zone 110. At this point, any signals emitted by the mobile wireless communications device 106 will not be received by the base station 102 since the mobile wireless communications device 106 is outside of the detection zone 108.

At step 208, the base station 102, now no longer receiving signals from the mobile wireless communications device 106, "disassociates" with the mobile wireless communications device 106 by recording a "departureTime" as the time when the last signal is received from the mobile wireless communications device 106 at the base station 102. Algorithm 2 below details the algorithm corresponding to the mobile wireless communications device 106 "disassociating" with the base station 102: Algorithm 2: When a mobile wireless communications device disassociates with a base station j, perform the following:

R = findRecord(zj) where departureTime=infinity

R.nodelD = R.nodelD (unchanged)

R.customerlD = R.customerlD (unchanged)

R.entryTime = R.entryTime (unchanged)

R.departureTime = current time

The data record R comprising the "nodelD", the "customerlD", the "entryTime" and the "departureTime" is stored in a memory, either at the base station 102 or at an external location to which the base station 102 has communicated the information.

Hence it will be seen that data relating to the entry and departure time for a given customer or equivalently device is collected automatically and efficiently allowing capture of behavioural data relating to sojourn time at the cell, as well as potentially linking movement with other cells.

With reference to figure 3, according to another embodiment the system comprises first and second base stations (302a, 302b), including first and second detection ranges (304a, 304b) and first and second detection zones (308a, 308b). The system further comprises a mobile wireless communications device 306, a non-detection zone 310 and an overlap zone 300.

As discussed above, it is further desirable to capture data regarding transitions between cells. Figure 4 shows a flow diagram of an example system comprising first and second base stations (302a, 302b) with respective first and second detection ranges 304a and 304b. As with the system in figure 2, at step 400 the mobile wireless communications device 306 may begin in the non-detection zone 310, located outside of the first and second detection ranges 304a and 304b of first and second base stations 302a and 302b. Preferably the cells are distributed to cover the entire service area to allow

establishment of an uninterrupted mobility pattern. Where there is clear delineation between detection zones, movement may be tracked by monitoring appropriate association and disassociation as set out below. However, the approach described can also provide reliable data where detection zones overlap.

When the mobile wireless communications device 306 enters one of the detection ranges, for example the first detection range 304a of the first base station 302a at step 402, it is located in the first detection zone 308a. At this point, signals emitted by the mobile wireless communications device 306 are received by the first base station 302a. As with step 204 of figure 2, at step 404 the first base station 302a "associates" with the mobile wireless communications device 306 and records the "node ID" of the first base station 302a, the "customerlD" of the mobile wireless communications device 306 and the "entryTime". As the mobile wireless communications device 306 moves around within the first detection zone 308a, it may enter the second detection range 304b of the second base station 302b, and therefore may simultaneously be in the first detection zone 308a and the second detection zone 308b, as shown at step 406 of figure 4. This point is referred to in figure 3 as the overlap zone 300. While in the overlap zone, the mobile wireless communications device 306 associates with the base station which is receiving the strongest signal from the mobile wireless communications device 306.

In the case where the strongest received signal is received at the first base station 302a, the mobile wireless communications device 306 continues to be associated with the first base station 302a, as shown at step 408. At step 410, in the case where a simple transition to a new detection zone takes place or where the strongest received signal is at another base station, for example the second base station 302b, the first base station 302a "disassociates" with the mobile wireless communications device 306 by recording a "departureTime" associated with the first base station 302a. (Note that similar to existing cell handoffs in cellular wireless networks, elaborated algorithms such as use of hysteresis can be used to determine transition of a mobile device from the detection zone of one base station to a neighbouring one.) Simultaneously, the second base station 302b "associates" with the mobile wireless communications device 306 and records: the "node ID" of the second base station 302b, the "customerlD" of the mobile wireless communications device 306 and the "entryTime". Similarly, if a new transition or overlap zone is encountered, the process of "disassociation" of the current base station and subsequent "association" of a new base station with the mobile wireless communications device 306 repeats if the new base station receives a stronger signal from the mobile wireless

communications device 306 than the current base station.

With each disassociation of a base station with a mobile wireless communications device, the "nodelD" for that base station, the " customerlD" ', the "entryTime" and the "departure Time" are stored in a memory, either at the base station or at an external location to which the base station has communicated the information. Each base station may have its own memory, or they may all link to a central memory.

It can be seen that, by using this method, the data record R is populated with data as a mobile wireless communications device moves around a network of base stations. In the case where the base stations are femtocells or WiFi access points, the location of each mobile wireless communications device is known to an accuracy of a few to a couple of tens of metres. The fact that a mobile wireless communications device can be "associated" with a particular femtocell or access point is enough to determine the location of the mobile wireless communications device. Therefore, this method allows the location, time spent at said location, and information on transitions between cells of a mobile wireless communications device to be known as it moves through a network of femtocells and/or access points. Analysing the Collected Mobility Data

To complement the spatial mobility as reflected by the graph model, the method captures the temporal characteristics of mobility by providing the probability distributions of sojourn times for each node of the graph where the sojourn time is the time duration in which an arbitrary customer stays associated with a given node.

Furthermore, to capture the interdependence between the amount of time a customer stays in one node and that in a second node, the technique provides the correlation coefficient between the sojourn times of an arbitrary customer in any given pair of nodes in the graph. In addition, to capture the frequency of visits to each node, the approach provides the relative frequency (between 0 and 1) of visits made by an arbitrary customer to a given node of the graph compared to the other nodes of the graph.

The graph model is established for a given deployment of small cells (which are also referred to as small base stations such as femtocells and/or WiFi access points) in a service area. For the purpose of mobility profiling, the service area is also referred to as the tracking area. New nodes are added to the graph as additional cells are deployed. As discussed below, the proposed method collects the mobility data as customers move from cell to cell. The data is processed and aggregated into the edge weights (i.e., branching probabilities) in the graph model, the sojourn-time

distributions and the correlation coefficients matrix for the sojourn times, as well as the relative frequency distribution of visits to each node. As a result, the model parameters including edge weights, sojourn-time distributions and the correlation coefficients matrix elements, as well as the relative frequency distribution, which represent the aggregated mobility profile, can be obtained and revised based on actual customer mobility data. However, after updating these model parameters by processing a piece of mobility data, the latter can be deleted permanently from the system as a means to ensure privacy and anonymity of customers. Furthermore, disposing of the collected mobility data has another important advantage of reducing costs and risks related to excessive data hoarding.

Turning to one detailed implementation, Figure 5 provides a summary flow diagram of one embodiment for analysing the data. At step 500, a first mobile wireless

communications device enters a tracking area containing at least one cell capable of "associating" with the first mobile wireless communications device when the first mobile wireless communications device enters the detection range of a cell.

At step 502, each time the first mobile wireless communications device "associates" and subsequently "disassociates" with a cell, the mobility data for that cell ("nodelD", "entryTime" and "departureTime") and the "customerlD" of the first mobile wireless communications device is sent to a database which may be located on a processing unit. A transition between two cells for a given device having a customer ID can be identified by sequentially tracking and correlating departure time from a first cell with the temporally closest entry time to another cell in one simple implementation - even if there is spatial/temporal discontinuity between cells. As an exemplary

embodiments, if two mobility data records associated with the same customer ID reveal that a time gap between the departure time from one cell A and the entry time of a neighbouring cell B is less than a specific delay threshold (e.g., several seconds), then the transition from cell A to B for the said customer is considered to have taken place. Naturally, the delay threshold for a given pair of neighbouring cells can be determined and calibrated based on actual measurements from the deployed network.

In order to aggregate behaviour for multiple customers, at step 504 the processing unit containing the database updates a field in the database corresponding to the " customerlD" of the first mobile wireless communications device with the mobility data.

At step 506, a second mobile wireless communications device enters the same tracking area containing at least one cell capable of "associating" with the second mobile wireless communications device when the second mobile wireless communications device enters the detection range of a cell.

In a similar manner to step 502, at step 508 each time the second mobile wireless communications device "associates" and subsequently "disassociates" with a cell, the mobility data for that cell ("nodelD", "entryTime" and "departureTime") and the "customerlD" of the second mobile wireless communications device is sent to a database which may be located on a processing unit. In a similar manner to step 504, at step 510 the processing unit containing the database updates a field in the database corresponding to the "customerlD^'" of the second mobile wireless communications device with the mobility data.

At step 512, and as discussed in more detail below, an analyser analyses the database fields corresponding to the "customerlDs" of the first and second mobile wireless communications devices. The analysis includes determining probabilities, correlation coefficients or any other parameter. For example, one parameter is the "spatial mobility", i.e. the movement from one cell to another. This can be represented by a directed graph in which each node corresponds to one cell and an edge pointing from node A to another node B if a mobile wireless communications device can possibly move from A to B directly (i.e., cells A and B are expected to be neighbour to each other). A mobile wireless communications device is considered to be located within the detection range of a cell (node) if the mobile wireless communications device is associated with that cell. In one embodiment, the data is updated for each transition to increase the probability value for the particular transition and decrease that for all other transitions, as set out below. This provides a simple and automated manner of continually or dynamically updating the data as more data is added.

Figure 7a shows an example tracking area comprising a plurality of cells 1 to 12. Figure 7b shows an example spatial-mobility graph corresponding to the tracking area of figure 7a. Referring now to figures 7a and 7b, the spatial mobility of a mobile wireless communications device in the tracking area of figure 7a is modelled by a graph G(TU W, E) where ?U wis the set of nodes 1 to 12. Each node 1 to 12 in V represents a cell deployed in the tracking area of figure 7a. The set W has two fictitious nodes a and b, which are used to represent the "beginning" and the "end" of the journey of customers; node a is called source and node b is called sink. In figure 7b, node a is represented by label 0 and node b is represented by label 13. The journey of a given customer is deemed to have started when its presence is detected by any of the cells in the tracking area. Also, the journey of a given customer is deemed to have ended when no cell in the tracking area can detect the presence of that customer for certain amount of time. Each edge e(i, j) in E shows the possibility of a customer moving directly from node to node j. Each edge e(i, j) has a weight between 0 and 1 that corresponds to the probability of an arbitrary mobile wireless communications device in node to move to node j as its next movement. When the node i is the fictitious node a, the edge (i,j) represents the probability that the journey of the customer starts at node j. Also, when the node j is the fictitious node b, the edge (i,j) represents the probability that the journey of the customer ends right after visiting node i. The graph G is directional because the probability of moving from node to node j is not necessarily equal to that of moving from node j to node .

Referring now to figure 7c, the edge weights in the graph G can be represented by a square matrix M= [ni_y] _ij=0, ._.. ,K+I where K = \ V\ . Each element m_l} is the probability for an arbitrary mobile wireless communications device in node to move directly to node j. Note that node 0 is the fictitious cell (node) representing the source and node K+1 is the fictitious cell (node) representing the sink. The entry of a customer into the sink node is regarded as the end of the customer's journey in the tracking area (service area). The matrix Mis stochastic, i.e., each my element lies between 0 and 1 and the row sum is equal to 1. Using the mobility data collected above, all the elements of the matrix M are updated iteratively. Specifically, this method examines the database field associated with a specific " ^' customerlD" of a mobile wireless communications device. When it identifies a movement of the mobile wireless communications device from node / to node j, it updates the elements m_lk for k = Ο, .,. ,Κ + 1 of the matrix M

according to the following equation:

=^{απι η)} (*) (l) where l_}(k) is the indicator function equal to 1 if k =j and 0 elsewhere. That is, l_j(k)=\ for k=j is to indicate the fact that the device moves from cell to cell k, including the possibility of cell k being the source node 0 or the sink node K+1.

The matrix M is initialized in a way that ensures that all transitions between

neighbouring nodes (adjacent cells) are equiprobable, i.e., for = 0, ... ,K, if node has η neighbours, then m_l} = l/η if node j is neighbour to node and m_l} = 0 is node j is not a neighbour to node . For = K +1 and j=0, K, set m_l} = 0 and m„ = 1. Note that, by convention, the source node (node 0) is initially neighbour to all nodes representing cells deployed at the entrance of the tracking area. Note that although figure 7b shows only edges from the source node to nodes deployed at the entrance of the tracking area, edges are also possible from the source node to any other node in the deployment area. This is to capture the fact that a given customer may have not been detected by nodes deployed at the entrance of tracking area. This may be caused by the fact that the customer may have had his/her wireless communications mobile device switched off at the time he/she entered the tracking area. Similarly, although figure 7b shows only ^pHges from the nodes deployed at the exit of the tracking area to the sink node, edges are also possible from any other node in tracking area to the sink node. This is to capture the fact that a customer may not have been detected by nodes deployed at the exit of tracking area. This may be caused by the fact that the customer may have had his/her wireless communications mobile device switched off before he/she leaves the tracking area. According to the example provided in figure 7b, the entrance is also used to exit the tracking area; therefore, nodes 5 and 8 represent the cells deployed at the entrance and the exit of the tracking area at the same time. The updating of matrix M is performed according to equation (1) which shows how to calculate M at iteration (n+1) in function of M at interation (n) iteratively based on the movements of the first (n+1) mobile wireless communications devices leaving node as a function of that using the movements of the first (n) mobile wireless communications devices. Hence the probability value for the detected transition is incremented, and all others decremented. Essentially, Equation (1) uses exponential smoothing to iteratively update the values of the branching probabilities. The value of a lies between 0 and 1, and is generally taken to be larger than 0.9. The most appropriate value of a for a given tracking area can be determined experimentally. By convention, all η¾ values are set to 0 to represent the mobility from a node to itself.

Other example parameters that can be determined by the analyser include the probability distribution of the sojourn time of a specific mobile wireless

communications device in any given node and the correlations between the sojourn times in any pair of nodes. These two parameters characterise the temporal aspects of mobile wireless communications device mobility. These parameters provide information about the amount of time mobile wireless communications devices spend in each node and reveal the degree of correlation for the time durations a mobile wireless communications device spends in any two nodes. Only nodes representing cells that are deployed in the tracking area are considered the fictitious cell is excluded. Therefore, in the rest of the description, the term node means any node in the set V. The distribution of the relative frequency of visits to each node is defined as the relative percentage of the number of associations/disassociations made by a wireless communication device with a given node compared to those made with the other nodes. Let F = (f_l5 . . ., f_K) be a vector that describes the relative frequency of visits to all nodes. Each element fj j=l, . . ., K describes the relative frequency of visits to node j. Therefore 0 <= f_j <= 1 for j=l,. . .,K and fi + . . . + f_K = 1. When the method receives a mobility data record associated with node j, it invokes the following Algorithm 3 to update =(f_l5 . . ., f_K). The algorithm is run at the analyser each time the database is updated with a new mobility data record. The updating of F is performed iteratively and F⁽ⁿ⁾ is the value oiF after processing n mobility data records. Initially, this method sets all values of F to 1/K to start with an equiprobable distribution, i.e. J_j ⁽⁰⁾ = FK, for ally.

Algorithm 3

Data: R: mobility data record

j = R.nodeID

for k = 1, K do

if k =j then

fk⁽ⁿ⁺¹⁾ = β fk⁽ⁿ⁾ + 1 - β where β is properly chosen between 0 and 1 else

r (n+l) _ o f (n)

Jk - P Jk

end

The sojourn time is defined as the amount of time a mobile wireless communications device spends in any given node. The probability distribution characterizing the sojourn time is chosen as a way to limit the number of model parameters in the representation of mobile wireless communications device mobility and to enable the deletion of mobility data after it is processed and aggregated into the model parameters. The deletion of raw data from the system after processing can help preserve the privacy and anonymity of customers using mobile wireless

communications devices and avoid memory costs and risks associated with storage of a huge volume of data. For example, a technique similar to the one presented by K. K. Leung in "Power Control by Kalman Filter With Error Margin for Wireless IP

Networks", in Proceeding of IEEE WCNC, Chicago, IL, September 2000 can be used to obtain the sojourn-time distribution for each node as follows.

Let Umax be the maximum amount of time a mobile wireless communications device may spend at a particular node. For ease of computation, we divide

into L time intervals of equal duration. Let the time intervals be indexed by / = 1, 2, ... , L so that for a given interval /, the sojourn time ranges from x_max(/ - 1)/L to r_ma /L

Let P_j(l) be the probability that the sojourn time of an arbitrary mobile wireless communications device in node j is less than or equal to for each node j from

1 to K and every time interval / from 1 to L. For a given node j, the values of P_j(l) for all / represent an approximate cumulative probability function (CDF) for the sojourn time of an specific mobile wireless communications device in node j. When the method receives a mobility data record associated with node j, it invokes the following Algorithm 4 to update P_}(1) for all time intervals /. The algorithm is run at the analyser each time the database is updated with a new mobility data record. The updating of P₃ is performed iteratively and Pjⁿ⁾(l) is the value oiPfl) after processing n mobility data records for node j. Initially, this method sets all values of Pfl) to 7, i.e. P_j ⁽⁰⁾(l) = 1, for all / and j.

Algorithm 4

Data: R: mobility data record

j = R.nodeID

r = R.departureTime - R.entryTime is the smallest integer that is larger than or equal to x.

for k = 1, .. , 1 - 1 do

yPfⁿ⁾ (k) where y is properly chosen between 0 and 1

end

for k = I, , L do

end

To capture the possible dependencies between the sojourn times of mobile wireless communications devices in two different nodes, the correlations between the sojourn times in these nodes is calculated. This calculation requires a consideration of the entire journey of each mobile wireless communications device from the entry point of the tracking area to the exit point of the tracking area. The sojourn times of mobile wireless communications device in the sequence of nodes for the same journey are obtained from R(i), which denotes the set of mobility data records of mobile wireless communications device during its journey. Specifically, for the mobile wireless communications device , let R = {Rj, R₂, ... ,RN} where R_v, v = 1, ... ,Nrepresents the mobility data record created and completed after a mobile wireless communications device disassociates with the last node and leaves the tracking (service) area. The mobile wireless communications device is considered to have left the tracking area if no mobility record is created for it for a certain amount of time. This time also takes into account the mobile wireless communications devices which are no longer detectable due to other causes such as (switched off by customers, battery shortage, etc). The index for the notation has been dropped for brevity.

The sojourn times of the mobile wireless communications device per visit to a node can be directly obtained from the set of records R by subtracting the value of the entryTime field from the value of the departureTime field of each mobility data record. We use S to represent these sojourn times. We have, S = {S_lt ... , S_v, ... , S_Nj where S_v = R_v.departureTime -R_v.entryTime for v = 1, ... ,N.

The calculation of the correlation coefficients involves the calculation of the covariance a_jk between the sojourn times of an arbitrary mobile wireless

communications device for any pair of nodes j, k = Ι, .,. ,Κ. The calculation of the covariance requires the calculation of the mean and the standard deviation sojourn times at each node, which can be calculated by examining every mobility data record of the set R. For every mobility data record R_v, v = 1, ... ,N, the node identifier is obtained by accessing the nodelD field. Let j be the node visited during the visit in which R_v was created and completed, i.e., = R_v.nodeID. S_v is used to update the value of the mean sojourn time μ_} at node j according to the following equation: μ/^η+Ι) = δμ/^η) + (1 - S)S_V (2)

where δ is a properly chosen parameter between 0 and 1. Equation (2) shows that the updating of μ₃ is performed iteratively and μ ^η) is the value of μ₃ after processing n mobility data records in which the nodelD field is equal to j. Initially, all the mean sojourn time values are equal to 0, i.e., μ ⁰⁾ = 0 for j = 1, ... ,K.

The calculation of σ_} the standard deviation of the sojourn time at node j is calculated according to the following equation:

σ (η+1) _{= ξ σ} (η) _{+ (]} _ _ξ) _ _μ (η+1) j _{(3 )} where ξ is a properly chosen parameter between 0 and 1 and |x| denotes the absolute

(n) value of x. Eqution (3) shows that the updating of Oj is performed iteratively and (7, is the value of Oj after processing n mobility data records in which the nodelD field is equal to j. Initially, all the standard deviations of sojourn time values are equal to 0,

The calculation of the correlation coefficient requires the consideration of all pairs of mobility data records (R_v, R_w) for v, w = 1, ... ,N and v < w. Formally, let j = R_v.nodeID and k = R_w.nodeID for each pair of mobility data records (R_v, R_w). The sojourn times S, and S_w extracted from R_v and R_w respectively are used to update the covariance a_jk between the sojourn times at nodes j and k, according to the following equation:

¾ _ (n+l) _ - o < , (n) , (n) (n) > + // Jf\ o o - μ ..₃ (ⁿ⁺¹) μ_n^ (ⁿ⁺¹) _ίΛ

(Ojk ) + (1 - o) W (4)

Equation (4) shows that the updating of Οβ is performed iteratively and Οβ ⁽ⁿ⁾ is the value of Ο after processing n pairs of mobility data records (R_v, R_w) in which

R_v.nodeID = j and R_w.nodeId = k. Initially, all the covariance values are equal to 0, i.e., Ojk ⁽⁰⁾ = 0 for j, k = 1, ... ,K. Finally, the value of the correlation coefficient P_jk⁽ⁿ⁺¹⁾ is obtained according to equation (5):

After the processing of all the pairs of mobility data records (R_v, R_w), v < w = 1, ... ,N, the set R of mobility data records can be disposed of permanently. Hence mobility data is obtained and aggregated, allowing the identification of trends such as the correlation between residence in central locations and their respective timing, allowing customer behaviour patterns to be developed.

Since the set R of mobility data records can be disposed of permanently after all the pairs of mobility data have been processed, data linking individual mobile wireless communications devices to locations and times is not maintained. As a result, only the mobility of groups of mobile wireless communications devices is stored, therefore the privacy of individual customers of mobile wireless communications devices is maintained. Furthermore, the disposal of mobility data records R after processing reduces the cost of data storage and minimises the security threats related to the stored data.

Yet further, data can be aggregated by customer type, for example customers can be categorized based on data associated with their identification. Such data may be extracted, for example, from data available to a network provider and can allow categorization of a customer at a generic level, again without requiring privacy related data. Hence, in addition to an overall aggregated data set, customer-categorized aggregated sub-sets can also be developed and stored.

Figure 6 shows an exemplary network for collecting and analysing mobility data records. Femtocells 600 are located in a tracking area. The cells may have a coverage that includes the entire tracking area, or may only include a portion of it. As an example, the tracking area could be a supermarket, a shopping mall, or any other area where it might be desirable to track customer movement.

The cells 600 associate with mobile wireless communications devices 610, and

"associate" and "disassociate" with them, as described above in relation to figures 1-5. The mobile wireless communications devices may be mobile phones held by customers, for example. As such the described method tracks the movement of customers in a tracking area using mobile phones that they carry about their person.

Mobility data collected by the cells is transmitted, via the internet 604, to a processing unit which may be a gateway 606. The gateway 606 collects presence information from the presence server, and creates and completes customer mobility data records. It uses these records to calculate the branching probabilities, the distributions of the sojourn times, and the correlation coefficients between the sojourn times at different cells. As mentioned above, the processing unit updates a database field with the mobility data for each mobile wireless communications device corresponding to the "customerlD" and the "nodelD". Furthermore, the gateway may also include the analyser which calculates the parameters to characterise mobility, as mentioned above.

The gateway may interact with the network of a service provider. In doing so, the gateway may request further information from the service provider related to a

"customerlD" of an individual mobile wireless communications device. For example, the gateway may request the age of a customer using the specific mobile wireless communications device. In this way, the gateway is able to calculate parameters for an individual age range of customers, for example. Other than the age range of customers, other data related to the customer could be requested from the service provider to provide more detailed mobility parameters.

In one embodiment, the disclosed approach can operate in real-time or non-real-time basis. When the approach does not require real time customer mobility data, the gateway 606 may ask, for example, the presence server 602 or other appropriate entity, to accumulate the data locally and transmit the accumulated data from time to time to reduce the overhead and frequency of interaction between the gateway and the presence server, thereby improving the overall scalability of the approach.

In the embodiment discussed, mobility data is stored in a database and each customer can be identified by a unique identity (ID) such as IMSI, TMSI IMEI, and MAC address. This ID could make it possible to identify who the customer is and obtain further personal information, but the use of femtocells makes it possible to overcome these limitations and provides additional options for new services. Specifically, the infrastructure, i.e., the deployed femtocells, makes it possible to communicate with customers. Therefore, a text message for example can be sent to customers to inform them about tracking and its options (anonymous or not). Specifically, customers can be offered with the options of agreeing or disagreeing with the mobility tracking and aggregated profiling. They can then make their choices and communicate their decisions to the system. In any case, mobility data of every individual customer is processed and aggregated in real time with previous mobility data, and is cleared right after that. Therefore, customers remain completely anonymous. Additionally, in this embodiment it may be the presence server instead of the gateway that communicates with a service provider to request further customer data. Such customer data may include customer age, gender, residential area, etc. so that the data can be used to establish categories of aggregated mobility profiles for various customer types.

It will be seen that the approach can be implemented in a network/location of any type or scale and using any appropriate type of wireless technology allowing the desired level of resolution and data exchange. Any type of ID can be relied on, which can be inherent to the technology adapted or an add-on.

Further, in the description above each node in the graph represents a cell, but can be extended so that each node in the graph represents a set of neighbouring cells.

Although the described sojourn time is the time a customer spends at a node per visit to the node, the definition can be extended to include repeated and separate visits to the same node during a given journey. Thus, the sojourn time can also be defined as the cumulative time a customer spends at the same node in the whole journey. The approach can also be extended to establish/use different graph models and

distributions for various customer categories which are defined according to customer's age, gender, residential area, etc. In this case, customer information may be needed form the service providers so that the approach can process and aggregate a given mobility data record into the corresponding graph model and distributions associated with the given customer, after which the private information is discarded and irretrievable.

Further, because of the storage of aggregated data and update of all values associated with the mobility profiling each time data is collected, received and processed, additional cells can also be added dynamically and automatically and their values populated as the data set grows by simple extension of the data and without requiring a rewrite of the existing structure.

Claims

A system for collecting and characterising mobility data of a plurality of mobile wireless communications devices, the system comprising:

at least a first and a second detector having a respective first and second detection range and each arranged to detect entry and departure of a mobile wireless communications device into the detection range and store associated mobility data; and

an aggregator arranged to aggregate mobility data received from the detectors.

The system of claim 1 wherein the mobility data comprises at least one of an entry time defined as the time when a mobile wireless communications device enters the detection range, a mobile wireless communications device identifier, a departure time defined as the time when a mobile wireless communications device departs from the detection range and a detector identifier.

The system of any preceding claim wherein the mobility data is used to construct a mobility profile.

The system of claim 3 wherein the mobility profile contains information regarding at least one of: the probability of a user of a mobile wireless communications device moving from one cell to another, the distribution of the amount of time a customer spends in each cell and the correlations between these times.

The system of any preceding claim wherein the aggregator aggregates mobility data in the form of coefficient values in a data set.

6. The system of any preceding claim wherein the aggregator updates coefficient values in a data set as additional mobility data is received.

7. The system of claim 5 or 6 wherein the coefficient value comprises at least one of a probability of transition between two detection ranges, a probability of time spent in a detection range and the correlation between times spent in two detection ranges.

8. The system of claim 5 or claim 6 wherein, after the aggregator has updated the mobility coefficients, the mobility data is deleted.

9. The system of claim 2 wherein the time spent in the detection range is calculated from the entry time and the departure time.

10. The system of any preceding claim wherein the detector is a femtocell or a

WiFi access point base unit.

11. The system of claim 2 wherein the mobile wireless communications device identifier is one of an International Mobile Subscriber Identity (IMSI), a Temporary Mobile Subscriber Identity (TMSI) or an International Mobile

Equipment Identity (IMEI).

12. A method for collecting and characterising mobility data of a plurality of mobile wireless communications devices, the method comprising:

detecting the entry and departure of a mobile wireless communications device into a first or second detection range of respective first and second detectors,

storing associated mobility data on the first or second detector, receiving the mobility data at an aggregator and aggregating the mobility data.

13. The method of claim 12 for use in the system of any one of claims 2 to 11.

14. A computer readable medium comprising instructions for performing the method of claims 12 or 13.

15. A computer processor arranged to operate under the instructions of the computer readable medium of claim 14.

16. A system for collecting and characterising mobility data of at least one mobile wireless communications device, the system comprising:

a processor arranged to receive mobility data from the detectors and correlate the mobility data obtained in at least two detection ranges.

17. The system of claim 16 wherein the mobility data comprises at least one of an entry time defined as the time when a mobile wireless communications device enters the detection range, a mobile wireless communications device identifier, and a departure time defined as the time when a mobile wireless communications device departs from the detection range and a detector identifier.

18. The system of claims 16 or 17 wherein the mobility data is used to construct a mobility profile.

19. The system of claim 18 wherein the mobility profile contains information regarding at least one of: the probability of a user of a mobile wireless communications device moving from one cell to another, the distribution of the amount of time a customer spends in each cell and the correlations between these times.

20. The system of either of claims 16 or 17 wherein the correlated mobility data includes at least one of a probability of transition between two detection ranges, a probability of time spent in a detection range and the correlation between times spent in two detection ranges.

21. The system of claim 20 wherein the time spent in a detection range is defined as the departure time minus the entry time.

22. The system of any of claims 16 to 21 wherein, when new mobility data is received from the detectors, the processor correlates the new mobility data between the two detection ranges.

23. The system of any of claims 16 to 22 wherein the processor correlates the mobility data for specific customer types.

24. The system of claim 23 wherein the customer type is determined from information related to the mobility data.

25. The system of claim 24 wherein the processor requests the information from a service provider.

26. The system of claim 24 wherein a device other than the processor requests the information from a service provider.

27. The system of any of claims 16 to 26 wherein the processor is located on one of the detectors.

28. A method for collecting and characterising mobility data of at least one mobile wireless communications devices, the method comprising:

storing associated mobility data on the first or second detector, receiving the mobility data at a processor, and

correlating the mobility data obtained in at least two detection ranges.

29. The method of claim 28 for use in the system of any one of claims 16 to 27.

30. A computer readable medium comprising instructions for performing the method of claims 28 or 29.

31. A computer processor arranged to operate under the instructions of the computer readable medium of claim 30.

32. A system, method, medium or processor substantially as described herein with reference to the accompanying drawings.