GB2523598A

GB2523598A - Method for determining the position of a client device

Info

Publication number: GB2523598A
Application number: GB1403628.9A
Authority: GB
Inventors: Toby Webb
Original assignee: SUPER LOCAL Ltd
Current assignee: SUPER LOCAL Ltd
Priority date: 2014-02-28
Filing date: 2014-02-28
Publication date: 2015-09-02
Also published as: GB201403628D0

Abstract

A method for determining the position of a client device is disclosed herein. The client device comprises a display screen and a front-facing camera for capturing images of objects which the display screen faces. The method comprises the steps of: determining a first estimate of the position of the client device 100 using positioning signals; using the front-facing camera to capture an image 200; acquiring object-shape data 400 representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; comparing the image with the object-shape data 500 to establish a match associated with a second estimate of the position of the client device ; and outputting a position 600 for the client device based on the second estimate. Processing the image can involve identifying parts of the image that are sky and non-sky, and enhancing the distinction between the sky and non-sky. The non-sky part can further be identified as a non-building object, excluded from any comparison, and replaced by either sky or building based on a prediction of what lies behind the non-building object.

Description

Method for Determining the Position of a Client Device The present invention relates to a method and a system for determining the position of a client device.

Accurate and reliable positioning has become increasingly important for a wide range of applications. Global Navigation Satellite Systems (GNSS) have become the positioning system of choice for most applications. However, in dense urban environments GNSS signals are often disrupted by buildings, resulting in poor performance. This problem is at its worst in areas with narrow streets and tall buildings, known as urban canyons.

In urban canyons, the buildings often block the direct line of sight to the satellites, reducing the total number of signals that can be received by the user equipment. For this reason, the number of signals sometimes falls below the minimum required (normally four) to obtain a GNSS position fix.

In addition to blocking the signals, buildings can also disrupt signals by reflecting or diffracting them. This can result in non-line of sight (NLOS) versions of the signal being received alongside the direct path version -a phenomenon known as multipath.

In urban canyons, the position accuracy is typically affected more in a direction that is perpendicular to the street (the cross-street direction) than in a direction that is parallel to the street (the along-street direction). This is because signals arriving from the cross-street direction are more likely to be blocked, or disrupted by diffraction and reflection, than signals from the along-street direction. Performance also tends to be worse when a user is located on a pavement, compared to when they are in the middle of the street. This is because the amount of visible sky is less nearer the edge of buildings than in the middle of the street, meaning fewer satellites are available.

For these reasons, current accuracy is often insufficient to localise the user to the correct side of the street or to the correct corner. This is a particular problem for pedestrian navigation applications, as in order to orientate themselves to the environment, the user has to look for additional cues, such as street signs and local landmarks. This can be time-consuming -especially in unfamiliar settings. Many other applications (such as friend-finding, taxi-hailing and location based advertising) would also benefit from improved urban position accuracy.

Many different technologies have been suggested to improve smartphone positioning performance in dense urban environments. One of these technologies involves using signals of opportunity, which are signals designed for purposes other than navigation, such as those used for telecommunications or broadcasting.

Wireless local area network (WLAN) signals (also known as wi-fi signals) and cellular signals may be classed as signals of opportunity. In most smartphones, these signals are currently used to augment GNSS positioning. These signals have extensive coverage and can be received in many of the areas where GNSS signals are disrupted (for example in urban canyons). However, whilst they allow a position to be determined when insufficient GNSS signals can be received, they do not typically achieve the level of accuracy required to satisfy many applications.

In principle, it is also possible to use radio and television signals for positioning, but a major disadvantage is that these methods require additional and costly hardware to be included in the smartphone.

Positioning performance may also be improved using dead reckoning, which measures the relative displacement of the user as they move. This can be combined with previous known locations to compute a position with respect to a reference frame. On a smartphone, dead reckoning can be achieved using sensors such as accelerometers, magnetometers and gyroscopes. However, due to measurement noise and errors accumulating over time, this method does not currently result in significant accuracy improvements.

It is also possible to improve positioning performance by using 3D city models to predict GNSS signal behaviour. A technique sometimes known as shadow matching uses 3D city models to predict the set of signals that can be received across a search area (and which signals are blocked due to building obstruction). The predicted signals are then compared, or pattern matched, to the set of signals observed by the smartphone. In its simplest implementation, the user's location is given by the point where the observed signals most closely agrees with the predicted signals.

Three dimensional city models may also be used to predict (and compensate for) the disruptions to GNSS measurements caused by multipath and NLOS reception.

However, a key disadvantage of both this technique and of shadow matching is the difficulty in accurately predicting the behaviour of GNSS signals due to, for example, the effect of vehicles, street furniture and of the user's body. As a result, improvements in positioning performance have been insufficient to meet the requirements of many applications. In addition, these techniques require very detailed and accurate 3D city models, which are currently not available for all urban areas.

Vision-based position fixing techniques have been suggested which require the user to purposefully take a query image, using a rear-facing camera, which is the camera on the opposite side to the user interface (UI) of the client device. This image from this camera is obtained by holding the client device up (so that the camera axis points approximately horizontally) and then taking an image of the surrounding environment. This image is then processed to detect distinctive image features, sometimes known as interest points or corners. These interest points are described in terms of the properties of their neighbourhood of pixels. This allows the query image to be compared, or matched, to reference images, which have been taken in advance. The user's location may be inferred from the geometrical relationship between the interest points in the query and reference images. However, there are several problems with this approach. Most notably the user is required to hold the client device in a different orientation to that required for normal operation. It is also often difficult to capture sufficient information in the query image to allow successful comparison with the reference images. This is particularly a problem in busy urban streets due to pedestrian occlusion of key features. Problems also occur because the quality of the query image is often poor, due to issues such as motion blur and poor lighting conditions. This makes it difficult to detect the same set of interest points that have been identified in the reference image -inhibiting the image comparison procedure. Other drawbacks arise from the cost of collecting reference images (making the technique unsuitable for wide-scale deployment) and the computational burden required to perform the matching procedure (making it difficult to implement the technique in real-time).

According to a first aspect of the invention there is provided a method for determining the position of a client device comprising a display screen and a front-facing camera for capturing images of objects which the display screen faces; the method comprising the steps of: determining a first estimate of the position of the client device using positioning signals; using the front-facing camera to capture an image; acquiring object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; comparing the image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and outputting a position for the client device based on the second estimate.

By using the front-facing camera, useful information can be captured in an image, revealing a user's position. This can occur automatically while the user is interacting with the display screen and holding the phone in their hand. Hence, an improved position can be achieved without requiring the user to orient the client device in an unnatural manner or take any unusual action.

During normal operation of the client device, the user is likely to hold the client device in their outstretched hand, so that they can look downwards towards the display screen. In this orientation the front-facing camera is likely to point upwards (at least partially). Thus, during normal operation, the front-facing camera is likely to capture images of building-tops in urban environments. Those images will be unique to the user's position. By matching the images to object-shape data for the local environment it is possible to determine the most likely position (or positions) for the client device.

The client device may be a mobile (cellular) phone or a tablet computer. Typically these devices include a front-facing camera and a rear-facing camera. Generally the front-facing camera will be located on the same side of the client device as the display screen, whilst the rear-facing camera will be located on the opposite side.

The client device may also be any type of technology that incorporates a camera, and when the device is used in its typical way for interaction with the user, this camera faces approximately upwards (and/or captures a large part of the scene above the user). For example, the client device may be an item of wearable technology such as a smartwatch.

In normal operation, the front-facing camera can be used to capture images of the user whilst the user is viewing the display screen. The front-facing camera typically points at least partially in a direction that is normal to the display screen. In addition, the front-facing camera typically points at least partially towards an operator of the client device who can interact with the display screen.

Preferably, Global Navigation Satellite System (GNSS) signals are used to determine a first estimate of the position of the client device. Alternatively, or in addition, wireless local area network (WLAN) signals, also known as wi-f i signals, cellular telephone signals or radio! television signals may be used. These techniques could even be augmented using methods such as dead-reckoning, to assist in obtaining the first estimate.

The object-shape data may be stored at a remote location, such as a server. In one embodiment the client device may send the image, preferably over a wireless network, to the server which may then perform the comparison process to establish the match. In another embodiment, the client device may acquire the object shape data by downloading it from the server over a wireless network. In this case, the client device may perform the comparison process to establish the match. In a further embodiment the object-shape data may be stored (pre-loaded) on the client device.

The step of comparing the image with the object-shape data to establish a match may be preceded by a step of deriving data from the image. The derived data may include a processed image or it may include structural information relating to the shape of objects captured in the image which can be compared with the object-shape data. On the other hand, the step of comparing the image with the object-shape data to establish a match may involve comparing the image captured by the front-facing camera directly with the object-shape data. This may involve image matching techniques.

The step of comparing the image with the object-shape data may result in more than one potential match. Each potential match may be associated with a possible position of the client device.

Preferably, the method comprises the step of: processing the image in order to identify a sky part of the image and a non-sky part of the image, wherein the processed image is compared with the object-shape data. Even with extremely poor quality images, taken at night-time, with significant levels of motion blur, it is possible

S

to distinguish between sky and non-sky parts of the image. This can be done with sufficient fidelity to allow accurate position determination. This technique is more robust than object recognition techniques as the reliability of such techniques is dependent on image quality.

The processed image may be like a silhouette of non-sky objects against sky.

Comparing this silhouette with the object-shape data can improve the speed of the comparison process. Preferably, the object-shape data is also simplified into sky/non-sky.

Categorising at least a part of the image as sky and at least a part of the image as non-sky is a simple and effective way of extracting object-shape information from the image. It is then possible to match the object-shape information from the processed image data with corresponding object-shape data to output an improved estimate of the position of the client device.

Preferably, the method comprises the step of: pre-processing the image to enhance the distinction between sky and non-sky. In this way, it is possible to identify building shape information more easily and reliably within the image.

The method may comprise the step of: identifying a non-sky part of the processed image as a non-building object, wherein the non-building object is excluded in the comparison step. It is typically easier to represent semi-permanent building objects in the object-shape data. Hence, by not comparing non-building parts of the image with the object-shape data, matches can be established with more accuracy.

In one example, the front-facing camera may capture an image that includes part of the user's head. The user's head would preferably be identified as a non-building object and omitted from the comparison process. This improves the likelihood that an accurate match will be found between the non-sky part of the image and the object shape data.

Preferably, the method comprises the steps of: identifying a non-sky part of the processed image as a non-building object and replacing the non-building object with sky or building based on a prediction of what lies behind the non-building object.

Thus, non-building parts of the image can be eliminated from the image, and,

S

instead, a prediction of what lies behind these parts can be used in the comparison step. Hence, matches can be established with more accuracy.

The method may comprise the steps of: using an orientation estimation device to determine the orientation of the front-lacing camera; and acquiring the object-shape data based on the orientation of the front-facing camera. In this way it is possible to limit the object-shape data acquired to those corresponding with the orientation of the front-facing camera. This improves the efficiency of the method because these data are more likely to yield a match. In addition, if the object-shape data are downloaded over a network this feature can reduce the amount of data that must be downloaded.

Accelerometers, gyroscopes and magnetometers are examples of sensors appropriate for determining the orientation of the client device. The orientation estimation device may comprise any one of these sensors or any combination thereof on-board the client device. The orientation estimation device may also determine the orientation of the client device based on assumed user behaviour, such as the orientation at which a user most commonly holds a client device.

The method may comprise the steps of: acquiring the angle of view of the front-facing camera; and acquiring the object-shape data based on the angle of view of the front-facing camera. The angle of view of the front-facing camera defines the angular extent of the scene captured by the front-facing camera. If the front-facing camera has a wide-angle of view a larger amount of object-shape data is likely to be required for an accurate comparison. On the other hand, if the front-facing camera has a narrow-angle of view less object-shape data will be required. The angle of view of the camera may be acquired from manufacturer information. The angle of view of the camera may be acquired alternatively from a sample image captured at the client device.

The method may comprise the step of: determining a confidence rating for the first estimate of the position of the client device, wherein the size of the area is based on the confidence rating. Thus, if a low confidence rating is associated with first estimate of the position of the client device a large area can be defined which is more likely to include the actual position of the client device. On the other hand, if a high confidence rating is associated with the first estimate a small area can be defined.

Thus, it is possible to optimise the amount of object-shape data used in the comparison process. Preferably, the confidence rating is based on the signal to noise ratio of the positioning signals.

The area may be defined based on the location of pedestrian and/or vehicle zones within the area. Thus, the object-shape data acquired can be limited to those relating to locations in or near pedestrian and/or vehicle zones. Since a user is likely to be located in one of these zones, this avoids acquiring unnecessary data.

The object-shape data may comprise a plurality of frames each associated with a view of three-dimensional objects from a specific position. In this way, the image captured by client device can be compared with frames representing what one might expect to see from various positions in the area. When the image matches with a frame the position associated with that frame may provide the second estimate of the position of the client device. This provides a simple comparison mechanism where the image is compared against a collection of discrete frames.

The method may comprise the step of: inputting a desired accuracy for the position of the client device, wherein the density of frames in the area is based on the inputted desired accuracy. By providing a high density of frames in an area it is possible to achieve high accuracy positioning, while consuming a greater amount of computing resources. By basing the density of frames on the accuracy required the amount of computing resources used can be optimised.

The object-shape data may be acquired from a three-dimensional model representative of the shape of objects in the area. Thus, it is possible to predict the appearance of the shape of objects in the area when viewed from different positions in the area from the three-dimensional model. These predictions then can be compared with the image captured by the front-facing camera to establish a match.

This may be a flexible way of acquiring data representative of the shape of objects from any given position in the area. Preferably, the three-dimensional model is a three-dimensional model of city structure representative of the shape of the buildings in the area.

The three-dimensional model may be derived from laser scanning and/or photogrammetry of the area. The three-dimensional model may be derived from map data combined with building height data of the area. The building height data may be

S

based on average building height in the area. The three-dimensional model may be derived from a plurality of connected panoramic views of the area.

The comparison step may establish a plurality of matches associated with respective positions of the client device and the method may comprises the step of: selecting the match associated with the position that is closest to the first estimate of the position of the client device.

In one example the comparison process may identify two matches, each associated with a respective position. In this situation, the match that is closest to the first estimate of the position of the client device is likely to be more accurate.

At least one of the steps may carried out using the client device and at least one of the steps may be carried out using a server. Alternatively, the steps may be carried out using the client device.

There is provided a computer program product comprising instructions for carrying out the method of the first aspect of the invention.

According to a second aspect of the invention there is provided a method for determining the position of a client device comprising the steps of: determining a first estimate of the position of the client device using positioning signals; capturing an image at the client device; processing the image in order to identify a sky part of the image and a non-sky part of the image; acquiring object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; comparing the processed image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and outputting a position for the client device based on the second estimate.

It will be appreciated that any of the steps and/or features described in connection with the first aspect of the invention may also be used in conjunction with the second aspect of the invention. At least one of the steps of the second aspect of the invention may be carried out using the client device and at least one of the steps may be carried out using a server. Alternatively, the steps of the second aspect of the invention may be carried out using the client device. There is also provided, a computer program product comprising instructions for carrying out the second aspect of the invention.

According to a third aspect of the invention there is provided a system for determining the position of a client device comprising a display screen and a front facing camera for capturing images of objects which the display screen faces, wherein the system comprises one or more processors configured as: a positioning module arranged to receive a first estimate of the position of the client device determined using positioning signals; an image module arranged to receive an image captured using the front-facing camera; an acquisition module arranged to acquire object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; a comparison module arranged to compare the image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and an output module arranged to output a position for the client device based on the second estimate.

At least one of the modules of the system of the third aspect of the invention may be located at the client device and at least one of the modules of the system may be located at the server. Alternatively, the modules of the system of the third aspect of the invention may be located at the client device.

According to a fourth aspect of the invention there is provided a system for determining the position of a client device wherein the system comprises one or more processors configured as: a positioning module arranged to receive a first estimate of the position of the client device determined using positioning signals; an image module arranged to receive an image captured by the client device; an image processing module arranged to process the image in order to identify a sky part of the image and a non-sky part of the image; an acquisition module arranged to acquire object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; a comparison module arranged to compare the processed image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and an output module arranged to output a position for the client device based on the second estimate.

At least one of the modules of the fourth aspect of the invention may be located at the client device and at least one of the modules may be located at a server.

Alternatively, the modules of the fourth aspect of the invention may be located at the client device Method features may be provided as apparatus features and vice-versa.

Preferred features of the present invention will now be described, purely by way of example, with reference to the accompanying drawings, in which: Figure 1 schematically shows the basic general architecture of a system for determining the position of a client device; Figure 2 shows a flow diagram of an overview of a method for determining the position of a client device; Figure 3 shows a flow diagram of the sub steps of image processing Step 300 shown in Figure 2; Figure 4A shows an example of an image captured by the front-facing camera; Figure 4B shows an example of an image after the segmentation in Steps 303-308 shown in Figure 3; Figure 4C shows an example of an image after the elimination in Steps 309-310 shown in Figure 3; Figure 5 shows a flow diagram of the sub steps of frame acquisition Step 400 shown in Figure 2; and Figure 6 shows an example of the area determined in Steps 401-403 shown in Figure 5.

Figure 1 schematically shows the basic general architecture of a system for determining the position of a client device. The system comprises a client device 1. a server 5 and a number of satellites 7. In this embodiment the client device 1 is a smartphone comprising a display screen 2, a front-facing camera 3, memory and a processor 4. The server 5 also comprises a processor 6.

The client device 1 is connected to the internet and can communicate with the server 5 over a wireless network. This allows the client device 1 to upload and download data to and from the server 5 over the wireless network. The client device 1 can receive GNSS signals from the satellites 7. The client device 1 can use the GNSS signals to determine an estimate of its position.

Figure 2 shows a flow diagram of an overview of a method for determining the position of a client device 1. In Step 100 the client device 1 establishes a first estimate of its position based on GNSS signals from the satellites 7. This initial estimate may be augmented using wi-fi signals, cellular signals or any other type of positioning signal. Map-matching may also be used to augment the first estimate.

In Step 200 the client device 1 captures an image using the front-facing camera 3.

Due to the way in which users will generally hold the client device 1 when interacting with the display screen 2, the front-facing camera 3 will be pointing upwards when the image is captured. If the user is located in an urban environment this image will usually show the sky and the tops of buildings.

In Step 300 the image is processed to produce an image which shows the silhouette of the buildings in the image against the sky background.

In Step 400, frames, which represent what one might expect to see from various positions around the initial estimate, are acquired. These frames are acquired using a three-dimensional model of the city structure of an area which includes the initial estimate of the position of the client device. Each frame is generated by choosing a viewpoint positioned in the area and generating an image of what one might expect the scene to look like from that viewpoint. Each frame shows a silhouette of buildings in the area against a sky background in the same way as the image captured at the client device 1.

In Step 500 the image captured by the front-facing camera 3 is compared with each of the frames. Each frame is given a score based on its similarity with the captured image.

In Step 600 a match is established. The frame with the highest score may be considered to be a match. The position associated with the match is used to determine an improved estimate of the position of the client device 1. After Step 600 the method returns to Step 100.

Figure 3 shows a flow diagram of the sub steps of image processing Step 300 shown in Figure 2. An example of an image captured by the front-facing camera 3 in Step that is used in Step 300 is shown in Figure 4A.

In Steps 301 and 302 pre-processing operations are performed with the aim of improving the quality of the image for the later steps. In general, the type of pre-processing operations that are used depend on the particular type of segmentation used in Steps 303-308 and are adapted to the quality of the image captured by the front-facing camera 3.

In Step 301 noise reduction such as smoothing is performed on the captured image.

Noise in the captured image can be caused by poor illumination, a dirty lens, and from hardware limitations. In Step 302 contrast stretching is used to improve the dynamic range of the captured image. This is particularly useful if the image has low-contrast. Typically, low-contrast images result from poor illumination, and hardware limitations. Noise reduction and contrast stretching operations are particularly useful if the image has been captured in low-light conditions, such as at night.

In Steps 303-308 segmentation is used to divide the image into regions. For images acquired using the front-camera 3 in urban areas these regions will typically include categories of object such as the user's head, street furniture, buildings and sky.

In Step 303 the acquired image, which is usually a colour image, is converted into a greyscale (or intensity) image. In Step 304 this greyscale image is then split into light or dark regions using a single threshold value, whose value is determined using Otsu's method. The result is a binary image, whose pixel values are either l's representing white or Os representing black.

By using this thresholding technique in daytime buildings and other objects appear as black silhouettes against a white (sky) background. At night-time a reversal takes place as buildings and other objects are white against a black sky. It is desirable to standardise the format so that objects and the sky take the same colour regardless of the time of day. In Step 305 it is determined which parts of the image are sky and objects. In Step 306 it is determined whether the image is in the correct format. If the thresholding operation has produced the wrong format, an image complement operation is performed in Step 307.

The effectiveness of the thresholding technique can be affected by factors such as poor illumination, reflections and overlapping objects. However, the binarised image may be processed to compensate for some of these effects. For example, shrinking operations or template fitting operations can be performed to split touching objects.

Elimination operations using, for example, mathematical morphology can be carried out to compensate for degradation due to poor illumination and reflections.

Performance of the segmentation in Steps 303-308 may also be improved by using more elaborate thresholding techniques, such as those that use a threshold value that varies across the image, sometimes known as variable thresholding. It is also possible to improve performance by combining region based segmentation techniques with edge-based techniques.

In Step 308 all objects in the binary image are demarcated. This may be achieved using a labelling routine that assigns pixels in each connected region to a different numerical value, to produce a label matrix.

Figure 4B shows the image shown in Figure 4A but after the segmentation process.

Figure 4B contains a sky region 10 in black and two objects 8, 9 in white. One of the objects is a non-building object, in this case the user's head 9. The other object is a building 8.

The segmentation process may by region-based or edge-based. Region-based methods involve dividing the image by detecting similarities whereas edge-based methods involve dividing the image by detecting discontinuities.

In Steps 309-310 non-building objects are replaced with either sky or building regions to produce an image containing only sky and building regions.

In Step 309 the properties of each object in the image is analysed to produce a set of descriptors. These describe the object's features, such as boundary-length, orientation, location, and circularity. These descriptors are then used to classify the object as either a building or a non-building object.

One of the objects in the image will almost always be the user's body. To identify this as a non-building object, it is possible to use the method described above. However, in general, performance can be improved using a chamfer matching procedure, which detects objects using a template fitting approach.

In Step 310 the non-building objects are eliminated from the image and replaced with either sky or building, based on a prediction of what lies behind them. For example, if the unwanted object is surrounded on all sides by sky, it may be removed and substituted with sky. Alternatively, if it is completely surrounded by building, it may be removed and substituted by building. If an object partially occludes both building and sky, the object can be substituted for sky, and then the eroded building portion approximately reconstructed using, for example, a convex hull approach.

Using these methods, it is possible to make the building formation stage robust to partial occlusion and effectively look through objects, such as the user's head.

However, in rare circumstances, no sensible guess can be made is to what lies behind an object. In this instance, it is possible to classify the object region as undefined and assign it a data-type that prevents it from contributing to the comparison procedure.

Figure 4C shows Figure 4B after the elimination in Steps 309-310. In this image the user's head has been removed during the building formation stage and replaced with sky.

Figure 5 shows a flow diagram of the sub steps of frame acquisition Step 400 shown in Figure 2. These sub steps will be described in reference to Figure 6.

In Steps 401-403 the size of the area (from which possible positions of the client device are to be derived) is determined. A frame will then be acquired for a number of possible positions of the client device. To define the area, the first estimate of the position of the client device is used. The area must contain the actual position of the client device. It should not be excessively large, as this increases the number of possible positions to be searched, which in turn raises the computational burden.

Thus, a trade-off between these two conflicting interests needs to be reached.

In Step 401 a confidence factor for the first estimate of the position of the client device is determined. The confidence factor may be calculated from the number, geometry and signal-to-noise (SNR) ratio of the signals that contribute to the fix, used in conjunction with knowledge of the systems past performance.

In Step 402 the size of the area is determined based on the confidence factor. Figure 6 shows an example of the area 14 determined in step 402. This is shown in the context of a map 11 with roads and buildings 13. The area is shown as a circle 14 centred on the first estimate of the position of the client device 15. In this case the radius of the circle 14 is inversely proportional to the confidence factor. Hence, a high confidence factor will result in a smaller circle having a smaller radius.

In Step 403 the area is limited using map data. As this positioning system is designed for pedestrian users, one option is to restrict the area to include only pedestrian zones, such as footpaths and traffic islands. Note, this means the area sometimes will comprise of several non-connected regions (rather than one connected region).

In this step map data is used to further constrain the area 14 to include only pedestrian zones 16 (in this case pavements) within the area 14 limits.

In this embodiment, the confidence limits of the first estimate have been modelled as a circle 14. However, for GNSS, in general the confidence limits can be more accurately modelled using an ellipse. For urban canyons, the ellipse is typically narrower in the direction parallel to the street (the along-street direction) than in the direction perpendicular to the street (the cross-street direction).

In Step 404 the possible positions of the client device are selected from within the area determined in Steps 401-403. One option is to select possible positions of the client device separated at regular spacings, for example, in a grid format. In principle, the smaller the spacing between possible positions, the greater the potential position accuracy of the second estimate. However, smaller spacings result in more possible positions, which increases the computational burden. On this basis, possible positions separated by around one metre may be an appropriate trade-off.

In Step 405 the orientation of the client device at the time at which the image was captured is determined. Orientation information may be obtained using accelerometer and magnetometer sensors, on-board the client device 1. Readings from these sensors can be taken at the moment the query image is obtained and a series of transformations performed to estimate the orientation of the client device 1.

However, depending on the quality of the sensors on-board the client device 1 and on the environmental conditions, the accuracy of the sensor readings may be poor, resulting in a poor orientation estimate.

Therefore, in some circumstances it may be advantageous to improve the orientation estimate by using additional information. One option is to use predictions about user behaviour: when a user accesses the user interface on the display screen 2, they tend to hold the client device 1 in a predictable manner and in principle this fact can be used to improve the orientation estimate.

Orientation estimation may also be assisted by using knowledge about the user's body position within the image captured by the front-facing camera 3. It is also possible to improve the orientation estimate by filtering multiple sensor readings over time to smooth the outputs.

In Step 406 the angle of view of the front-facing camera 3 is determined. The angle of view of the front-facing camera 3 defines the angular extent of the scene that is captured and is typically specified in terms of the camera's horizontal and vertical angle of view parameters. In general the angle of view parameters of a particular camera depend on the zoom settings. However, for most client devices 1 the zoom settings belonging to the front-facing camera are fixed at the point of manufacture.

Both horizontal and vertical angle of view parameters can be obtained from manufacturers, or measured directly (in advance) from a sample client device 1. If the zoom control settings were not set at the point of manufacture, ordinarily they should be set to the minimum values to maximise the angular extent of the captured scene.

In Step 407 the frames for each possible location of the client device 1 are generated from a three-dimensional model. It is possible to obtain decimetre level accuracy three-dimensional building models from specialist suppliers that have collected data using techniques such as airborne laser scanning and photogrammetry. However, these models are not available for all urban areas, and are often costly. Therefore, to allow the positioning method to be rolled out cost-effectively to all urban areas an alternative source may be required.

One option is to create three-dimensional building models by combining two-dimensional (standard) map data with building height data. This has the advantage that two-dimensional map data has ubiquitous coverage and is available for a little of no cost. Building height data can be purchased for many cities or can be estimated by counting the number of stories or by making a sensible guess based on knowledge of the average building height in the vicinity of the user.

To produce a three-dimensional model of a building from two-dimensional map data and building height estimates, first the buildings shape is extracted from the two-dimensional map data. For map data encoded in vector format this building shape will generally be stored as a polygon, expressed in terms of horizontal vertex coordinates. A three-dimensional model may be created by generating a shape that has two sets of vertices: each set has the same horizontal coordinates as the original two-dimensional building polygon. However, vertical information is added, so that the first set of vertices is located at ground level, and the second set is located at ground level plus an offset equal to the estimated building height.

This approach to forming a three-dimensional building model will not produce representations that contain all buildings details (such as roof shape). However, in general the frames generated will be sufficiently detailed and accurate for the system to operate successfully.

The three-dimensional model represents the buildings surrounding the first estimate of the position of the client device. Typically, only those buildings that are immediately adjacent to the first estimate need to be included as these are the only buildings that will be visible in the front-facing camera image of a user located at street-level.

In Step 407, the three-dimensional building models are projected, using a perspective projection method, onto a two-dimensional view surface that corresponds to the image plane of the front-facing camera 3. This projection procedure may be conducted using functions that are included in graphics packages.

The projection is done for a (hypothetical) client device located at each possible position, with the estimated orientation, and the angle of view parameters of that particular client device. However, this possible position should be specified in three-dimensional space (not just two-dimensional horizontal coordinates). The possible position may be assumed to be located at ground level. However, for greater accuracy, the vertical position can be estimated as at ground level plus a small offset of approximately a metre to account for the height at which a user typically holds the client device 1.

In Step 407, the building shapes that have been projected onto the image plane are used to create a frame. The frame is an image that has the same encoding scheme as the image captured by the front-facing camera, but with the building and sky pattern formed from the projected building shapes (rather than from the image captured by the front-facing camera). To achieve this, the projected buildings shapes are mapped onto an image-matrix with the same dimensions as the image captured by the front-facing camera.

Like with the captured image, the pixel values of building shape regions are assigned l's, and all other parts of the image are assigned U's (to represent sky). Using this technique, the frames are constructed for each possible position of the client device to produce a set of frames ready for the comparison to be performed in Step 500.

The image comparison Step 500 shown in Figure 2 will now be described in more detail. In Step 500 a score is determined for each frame. Each score is representative of the similarity between the captured image and the respective frame.

Different types of similarity measures may be used to compare the frames with the captured image. One option is to compute the proportion of matching pixels. The following formula may be used.

MN -E71Er1IQ(in -Rk(i,j)I

MN

where, 1k is the score of the Wh possible position Q is the captured image matrix Rk is the frame belonging to the k possible position / is the row pixel coordinate (1=1 M) jis the column pixel coordinate (1= M 1.1 is the modulus operator The captured image and frame both have Mrows and N columns.

For this measure, the higher the score, the greater the similarity between the frame and the captured image. The greatest score is one, which occurs if the templates match exactly. The use of other similarity measures may be used, such as those based on the normalised cross correlation (NCC) and sum of hamming distances (S H D).

It should be noted that in this equation, all matching pixels contribute equally to the score, regardless of whether they represent building or sky. In principle it may be possible to improve performance by devising weighting schemes that alter the contribution made by pixels based on whether they represent either sky or building.

For example, if building pixels were weighted more highly it may increase the likelihood of a successful match when only a small proportion of the captured image contains buildings.

It may also be possible to improve performance by weighting pixels based on their location within the frame and captured image. For example, pixels may be weighted more highly if they are located in the top half of the frame and captured image. In general, this part of both the frame and captured image corresponds to the area directly above the user. As a result, this part of the processed captured image is less to likely to contain errors due to non-building objects being erroneously identified as building objects. Therefore, these pixels could be weighted more highly to reflect this.

The position determination Step 600 shown in Figure 2 will now be described in more detail. In this step an improved position of the client device is based on the scores determined in Step 500. The simplest method of achieving this is to use the possible position relating to the highest scoring frame as the improved position of the client device. However, there are problems with this approach.

Often multiple frames have high scores that are very similar to each other. For example, when the user is in an urban canyon (but not near any junctions) all frames relating to the correct side of road tend to yield the highest scores, all with similar values. Similarly, if the user is standing at a junction corner, and there is a similarly oriented junction corner elsewhere in the area, then frames at both corners yield high scores with similar values.

For these situations, choosing the highest scoring frame will not necessarily yield the best position estimate. Therefore, to improve performance, an alternative technique is required. One possibility involves associating each frame with a weighting, based on knowledge of the possible position relating to the frame relative to the first estimate of the position of the client device. A weighted mean may then be used to determine the improved position of the client device. The following equation may be used.

>k_lwkak

K k=1 and

= p (aklb).yk where, ? is the improved position 0k is the weight of the k possible position ak is the position vector of the i? possible position b is the first estimate of the position of the client device Yk is the score of the km possible position p (akIb) is the probability density function for the true location being at ak given the first estimate is at b, (assuming to no knowledge of the score) Ic denotes the index of the possible position (k = 1 K) K is the total number of possible positions It should be noted that this equation assumes the density of possible positions to be approximately constant across the search area. The probability density function can be estimated in advance using performance knowledge (such as experimental data) of the positioning method used to obtain the first position estimate of the client device.

To achieve enhanced performance, other position estimation schemes may be devised that take into account additional factors such as the user's previous locations and an estimate of the speed they are walking. A measure of the quality of the improved estimate of the position of the client device may be obtained by computing, for example, a weighted standard deviation, or alternatively an error ellipse. This quality estimate may be used to provide an indication not only of the accuracy of the improved position estimate, but also of the chance of accuracy failures (i.e. its integrity). This information could be used to enhanced performance for some applications. For example, if the integrity is poor it may be better to output no enhanced position rather than risk outputting the wrong solution, which maybe on the incorrect side of the street.

Referring again to Figure 1, in order to implement the positioning technique the processing may be implemented entirely on the client device 1, alternatively the processing may be conducted entirely on a server 5 and alternatively some processing may be conducted on a server 5 and some on the client device 1.

If the method is implemented entirely on the client device 1, at the time a position fix is requested the client device 1 does not need to be connected to the internet, as no data needs to be offloaded for processing. However, this relies on the necessary data, such as map-data, being uploaded to the client device 1 in advance. The practically of this depends both on the quantity of data that needs to be stored and on whether it is feasible for a client device 1 (which has limited computing resources) to compute a position estimate in real-time.

If the processing is conducted entirely on a server 5, at the time a position fix is requested the client device 1 needs access to the internet to transfer data. For each position fix, the client device 1 must transfer the raw data, which is an image file of several megabytes in size, plus information about the client device 1 orientation and the first estimate of the position. The practically of this depends on whether this information can be transferred cost-effectively and sufficiently quickly for real-time implementation. This would depend on a variety of factors, including the speed of the internet connection and on bandwidth costs.

If some processing is conducted on a server 5 and some on the client device 1. one option is to offload those operations that are computationally intensive to the server 5, to increase the speed at which an improved position can be delivered. Less intensive processing operations may be conducted on the client device 1 to reduce the amount of information that needs to be transferred to the server 5. Processing conducted on the client device 1 may be conducted in software (using the client device's processor 4) or in hardware or firmware, or a combination of these.

However, for hardware or firmware implementation this will typically involve incorporating specialist hardware on the client device 1 at the point of manufacture.

In contrast, a software implementation requires no specialist hardware and can even be installed as an after-market add-on.

The technique can enhance positioning for a pedestrian user in dense urban environments. If the system is to be limited so that it only operates in certain locations, such as in urban environments, and in certain contexts, it is possible to use context detection and geofencing software to stop the system from operating in certain situations. Context detection software detects the type of activity the user is engaged in and geofencing software detects whether or not a user is in a specified region or regions.

Preliminary trials of the method for determining the position of the client device 1 have been conducted. In these trials the root mean squared (RMS) error using conventional positioning methods was 30 metres. The RMS error of the method described with reference to Figure 2 was 17 metres. The method was able to localise the client device 1 to the pavement on the correct side of the street 99 percent of the time, and to the correct corner (if at a junction) 97 percent of the time.

Claims

CLAIMSA method for determining the position of a client device comprising a display screen and a front-facing camera for capturing images of objects which the display screen faces; the method comprising the steps of: determining a first estimate of the position of the client device using positioning signals; using the front-facing camera to capture an image; acquiring object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; comparing the image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and outputting a position for the client device based on the second estimate.
2. A method according to claim 1 comprising the step of: processing the image in order to identify a sky part of the image and a non-sky part of the image, wherein the processed image is compared with the object-shape data.
3. A method according to claim 2 wherein the method comprises the further step of: pre-processing the image to enhance the distinction between sky and non-sky.
4. A method according to claim 2 or claim 3 comprising the step of: identifying a non-sky part of the processed image as a non-building object, wherein the non-building object is excluded in the comparison step.
5. A method according to any of claims 2-4 comprising the steps of: identifying a non-sky part of the processed image as a non-building object; and replacing the non-building object with sky or building based on a prediction of what lies behind the non-building object.
6. A method according to any of the preceding claims, comprising the steps of: using an orientation estimation device to determine the orientation of the front-facing camera; and acquiring the object-shape data based on the orientation of the front-facing camera.
7. A method according to any of the preceding claims comprising the steps of: acquiring the angle of view of the front-facing camera; and acquiring the object-shape data based on the angle of view of the front-facing camera.
8. A method according to any of the preceding claims, comprising the step of: determining a confidence rating for the first estimate of the position of the client device, wherein the size of the area is based on the confidence rating.
9. A method according to any of the preceding claims wherein the area is defined based on the location of pedestrian and/or vehicle zones within the area.
10. A method according to any of the preceding claims wherein the object-shape data comprises a plurality of frames each associated with a view of three-dimensional objects from a specific position.
11. A method according to claim 10 in which the method comprises the step of: inputting a desired accuracy for the position of the client device, wherein the density of frames in the area is based on the inputted desired accuracy.
12. A method according to any of the preceding claims wherein the object-shape data is acquired from a three-dimensional model representative of the shape of objects in the area.
13. A method according to any of the preceding claims wherein the comparison step establishes a plurality of matches associated with respective positions of the client device and the method comprises the step of: selecting the match associated with the position that is closest to the first estimate of the position of the client device.
14. A method according to any of the preceding claims wherein at least one of the steps is carried out using the client device and at least one of the steps is carried out using a server.
15. A method according to any of claims 1-13 wherein the steps are carried out using the client device.
16. A computer program product comprising instructions for carrying out the method of any of the preceding claims.
17. A method for determining the position of a client device comprising the steps of: determining a first estimate of the position of the client device using positioning signals; capturing an image at the client device; processing the image in order to identify a sky part of the image and a non-sky part of the image; acquiring object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; comparing the processed image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and outputting a position for the client device based on the second estimate.
18. A system for determining the position of a client device comprising a display screen and a front-facing camera for capturing images of objects which the display screen faces, wherein the system comprises one or more processors configured as: a positioning module arranged to receive a first estimate of the position of the client device determined using positioning signals; an image module arranged to receive an image captured using the front-facing camera; an acquisition module arranged to acquire object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; a comparison module arranged to compare the image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and an output module arranged to output a position for the client device based on the second estimate.
19. A system according to claim 18 comprising the client device and a server wherein at least one of the modules of the system is located at the client device and at least one of the modules of the system is located at the server.
20. A system claim 18 wherein the modules of the system are located at the client device.
21. A system for determining the position of a client device wherein the system comprises one or more processors configured as: a positioning module arranged to receive a first estimate of the position of the client device determined using positioning signals; an image module arranged to receive an image captured by the client device; an image processing module arranged to process the image in order to identify a sky part of the image and a non-sky part of the image; an acquisition module arranged to acquire object-shape data representative of the shape of objects within an area, wherein the first estimate of the position of the client device is within the area; a comparison module arranged to compare the processed image with the object-shape data to establish a match associated with a second estimate of the position of the client device; and an output module arranged to output a position for the client device based on the second estimate.
22. A system substantially as herein described with reference to and/or illustrated in the accompanying drawings.
23. A method substantially as herein described with reference to the accompanying drawings.