WO2019046962A1

WO2019046962A1 - Method and system for target positioning and map update

Info

Publication number: WO2019046962A1
Application number: PCT/CA2018/051101
Authority: WO
Inventors: Zhe HE; Jing Zhang
Original assignee: Appropolis Inc.
Priority date: 2017-09-07
Filing date: 2018-09-07
Publication date: 2019-03-14

Abstract

A system for determining the position of a moving object in a site. The system has one or more anchor devices deployed in the site for providing data sources related to the moving object and the site; a reference map of the site; and a processing structure for determining the position of the moving object using the data obtained by the one or more anchor devices and the reference map. The reference map has a data-source layer for supporting the data sources; a structure-feature layer extracted from the data sources and for representing and indexing the primary structures of the site; and a description layer. The system includes a process for updating the reference map using images of at least a portion of the site and a point cloud of the site.

Description

METHOD AND SYSTEM FOR TARGET POSITIONING AND MAP UPDATE

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of US Provisional Patent Application Serial No. 62/555,414 filed September 7, 2017, the content of which is incorporated herein by reference in its entirety.

FIELD OF THE DISCLOSURE

This disclosure relates generally to an object positioning system and method for determining the location of a moving obj ect in a site using a plurality of anchor devices and a map of the site, and in particular, relates to a system and method for updating the positions of changed anchor devices in the site.

BACKGROUND

With the recent rise of the Internet of Things (IoT) industry, numerous smart devices have been used in many areas. It is known that positioning or location information of an IoT smart device may bring great value for precise management and for spatial- and positioning-related applications.

Similarly, the demanding requirements of Location-Based Services (LBS) have led to the emergence of various positioning technologies. For example, the Global Positioning System (GPS) has been widely used and has been proven to be a good candidate solution for target localization or positioning in a typical outdoor environment using either pseudo range or carrier phase measurements. However, GPS generally requires line-of-sight connection between a GPS signal receiver and GPS signal transmitters on satellites. Therefore, GPS systems usually do not work well in indoor environment as the GPS signal strength is weakened by the building surrounding the GPS signal receiver.

Multiple-sensor indoor positioning systems are also known. Such systems use a plurality of sensors deployed in an indoor environment such as a building for positioning one or more moving objects. In a typical indoor scenario, a multiple-sensor indoor positioning system comprises a plurality of small anchor devices such as wall-mount tags, ceiling-mount beacons, and the like, mounted on the surface of different structures and/or inside various objects in the building for facilitating object positioning. The configurations of the anchor devices are generally stored in the system for ensuring proper system operation.

A challenge of the multiple-sensor indoor positioning system is that the anchor devices may be redeployed from time to time such as being moved to different locations and/or remounted with different configurations by users for various reasons and without notification. In such a situation, the stored configurations of the redeployed anchor devices have to be updated to match the actual configurations thereof after their redeployment. For example, if the position of an anchor device has been changed, the position record of the anchor device stored in the system has to be updated.

Moreover, the indoor environment itself may also change over time. For example, a new wall may be constructed for reconfiguring a room, a door may be blocked or rebuilt, and/or furniture may be relocated. Such change of the indoor environment also leads to significant impact to the spatial and positioning-related applications of smart devices.

A traditional approach for solving the above-described challenge is to survey or measure the indoor environment by using professional surveying equipment such as the total station, laser range-finder, digital level, and the like, regularly or when a change or reconfiguration to the indoor environment and/or the anchor devices is noticed. The principle in this approach is to measure the angles and distances from the targets (for example, changed or reconfigured anchor devices and/or building structures) to some known control points, and then use triangulation to estimate the targets' locations accordingly.

Such a traditional survey method can provide high accuracy in the target's position and the indoor environment measurement. However, the professional surveying equipment used in surveying is usually expensive and operators require special training in order to properly use the equipment. Moreover, the traditional survey methods are based on a pre-existing reference network primarily suitable for outdoor environments, and setting up a corresponding indoor reference network is usually difficult and time-consuming.

Map-based localization methods have also been used for solving the indoor positioning problem. A map-based localization method generally determines a target's position by referencing the target on a known map. Hitherto, most indoor maps are floor plans which are effective for pedestrian navigation but usually do not have sufficient detail for small device localization. For example, such floor plans often contain few details and lack elevation information.

Generalized maps for localization have been proposed for solving the indoor positioning problem. For example, in prior-art documents [1] to [3] listed in the REFERENCES section, a georeferenced image database is used as a reference map, and an image-based localization method of "feature matching" provides the link between the reference image database and the target images. While such and similar methods achieve some success in outdoor environments, they have issues in indoor environments. For example, an indoor environment almost always contains many textureless areas, and the variation in illumination results in over- and/or under-exposed images, thereby making it very difficult to find matching features between the target and reference images. Furthermore, as the maps are static, they are not adaptive to the changes in the environment after the maps are generated.

Technologies using updatable three-dimensional (3D) mapping are also known. For example, the Simultaneous Localization and Mapping (SLAM) technology as disclosed in prior- art documents [4] and [5] is a recently-developed technique which aims to provide a cost-effective way for mapping indoor environments in 3D via consumer devices such as Kinect, Tango phone, and the like (see prior-art documents [6] to [8]). However, SLAM was originally designed for mobile robotics navigation in an unknown environment, and the derived map is represented in an arbitrary coordinate frame which is decided by a robot's initial pose in the environment. For managing a large account of devices distributed in different buildings, a unified coordinate frame is a prerequisite.

Some researchers have tried to solve this problem by using control points to rectify the SLAM map into a unified reference frame. However, the indoor control points are both costly and difficult to acquire, and are usually intended for large indoor environments where typically only a few control points are used to rectify the map. Therefore, this method is not well-suited for a small area.

In many cases, an indoor positioning system may comprise a large number (such as hundreds) of anchor devices distributed in a building, and the positions of some devices may change after the building map is generated. In these cases, even if a portion of the building's inner structure and/or a portion of the anchor devices are changed over time, it is generally inefficient and sometimes impossible to determine the anchor devices' new positions and the building's new structure by remapping the whole building.

Instead of remapping the whole building, it is also known to reconstruct the local map about the changed area. For example, references [9] and [10] teaches a submap SLAM method which focuses on solving an incremental SLAM problem of how to j oin a sequence of local SLAM into a global SLAM. However, as a local map determined by local SLAM may not have enough control points in the area, it is difficult or even impossible to find a transformation to the reference frame. Hence, the challenge is how to transform the local map into the reference frame and then how to update the reference map by the newly-determined local map.

Therefore, there is a desire to develop a new method which efficiently and accurately measures anchor devices' positions, and a desire to develop an effective method to update the spatial information of the indoor environment. With a viable solution for measuring a target's position in an indoor environment, the IoT industry can continue its rapid development and expansion. Indeed, future technologies and developments in this industry demand and will rely upon an efficient, accurate solution for end-users to measure their indoor positions. SUMMARY

Embodiments herein discloses a system for determining the three-dimensional (3D) position of a moving object in a site (such as an indoor environment) and updating the spatial information of the site using computer vision and point-cloud processing methods. The system disclosed herein comprises one or more anchor devices deployed in the site for providing data sources related to the moving obj ect and the site, a generalized 3D map (also denoted as a reference map or a reference 3D map hereinafter) of the site and a processing structure for determining the position of the moving object using the data obtained by the one or more anchor devices and the map.

The reference map is first generated and all subsequent processings are established thereon. Unlike the traditional floor-plan maps and georeferenced-image maps, the reference map disclosed herein comprises three types of map layers. The first map layer is a data-source layer which supports various data sources such as sequences of optical images, depth images, 3D point cloud, geometric models, and the like. The second map layer is a structure-feature layer which is extracted from the data sources and is used for representing and indexing the primary structures of the site. The third map layer is a description layer which records information related to the data sources such as data capturing time, device type, data precision, and the like. The system uses the reference map and data obtained from the anchor devices for positioning one or more moving objects in the site.

In particular, the system first constructs the reference map of the site using previously- captured data sources such as optical images, depth images, 3D point cloud, and the like. These data sources are rectified and georeferenced by several control points, and the primary structures of the site are extracted therefrom. Then, descriptions such as the time of data collection, the precision of the collection device, and the like, are determined and recorded in the reference map.

The system may update the reference map periodically or as needed. In particular, the system can determine and measure the positions of targets including at least a portion of the anchor devices and/or at least a portion of the site (such as a building), and update the reference map based on the obtained target measurement.

In one embodiment, a local measurement can be conducted in an area of interest, for example, by using a consumer device such as a camera, a RGB-D sensor (i.e., a sensor capturing color images with depth data), a Light Detection and Ranging (LiDAR) device, or other similar devices to collect data of one or more target devices and surrounding environment in the area of interest. Then, a local 3D map (also denoted as a local map hereinafter) is constructed based on the obtained local measurement. As the local 3D map may not have sufficient control points, the local data sources are maintained in a local frame for the area of interest without being rectified. The same type of structural features and descriptions constructed for the reference map is extracted from the local 3D map. Then, the target anchor devices (also denoted as target devices hereinafter) in the area of interest are detected in the local 3D map and their positions are determined in the local frame.

Subsequently, a coarse-to-fine registration is applied to align the local 3D map with the reference 3D map, which is then used to estimate a geometric transformation from the local map to the reference map (a local-to-reference transformation) to convert the target devices' coordinates in local map to coordinates in the reference map. The local 3D map is also merged to the reference map by the local -to-reference transformation for updating the reference 3D map.

According to one aspect of this disclosure, there is provided a system for determining the position of a moving object in a site. The system comprises: one or more anchor devices deployed in the site for providing data sources related to the moving object and the site; a reference map of the site; and a processing structure for determining the position of the moving object using the data obtained by the one or more anchor devices and the reference map. The reference map comprises: a first layer comprising the data sources; and a second layer comprising data extracted from the data sources for representing and indexing the primary structures of the site.

In some embodiments, the reference map further comprises a third layer comprising information related to the data sources.

In some embodiments, the third layer comprises characteristics of the data sources; and wherein said characteristics comprises at least one of a data capturing time, a device type, and a data precision.

In some embodiments, the processing structure is configured for executing a map-updating process for updating the reference map using images of at least a portion of the site and a point cloud of the site.

In some embodiments, the map-updating process comprises: obtaining a local measurement of at least a portion of the site; constructing a local map for the at least one portion of the site using obtained local measurement; determining the location of one or more anchor devices in the local map; aligning the local map with the reference map; determining a geometric transformation from the local map to the reference map; convert the coordinates of said one or more anchor devices in local map to coordinates in the reference map by using the determined geometric transformation; and merging the local map with the reference map.

In some embodiments, the map-updating process further comprises: determining a position of a target device in the at least one portion of the site.

In some embodiments, the reference map further comprises geometric structures of the site; and the geometric structures comprises geometric features extracted from three-dimensional (3D) point cloud of the site.

In some embodiments, the geometric features comprises ceiling and wall models and intersection graphs of the ceiling and wall models.

In some embodiments, the processing structure is configured for executing a map- construction process for constructing a map using a point cloud, the map-construction process comprising: classifying points of the point cloud into at least horizontal points and vertical points; determining one or more ceilings based on the horizontal points; determining one or more walls based on the vertical points; determining intersections of the one or more walls; and storing the determined one or more ceilings and one or more walls in a database as the ceiling and wall models, and storing the determined intersections of the one or more walls in the database as the intersection graphs of the ceiling and wall models.

In some embodiments, said classifying the points of the point cloud into at least horizontal points and vertical points comprises: for each point of the point cloud, estimating a normal of the point from the neighbors thereof; calculating a cross-angle between the estimated normal and a vertical direction; classifying the point as a horizontal point if the calculated cross-angle is greater than a first threshold angle; and classifying the point as a vertical point if the calculated cross- angle is smaller than a second threshold angle, the second threshold angle being smaller than the first threshold angle.

In some embodiments, said classifying the points of the point cloud into at least horizontal points and vertical points further comprises: for each point of the point cloud, classifying the point as an unclassified point if the calculated cross-angle is between the first and second threshold angles.

In some embodiments, the first threshold angle is about 80 degrees and the second threshold angle is about 15 degrees.

In some embodiments, said estimating the normal of the point from the neighbors thereof comprises: estimating the normal of the point from the neighbors thereof using an Eigen analysis.

In some embodiments, said determining the one or more ceilings based on the horizontal points comprises: detecting one or more planes based on the horizontal points; for each detected plane, calculating the area thereof; for each detected plane, determining the plane as a ceiling if the area thereof is greater than an area-threshold.

In some embodiments, said detecting the one or more planes based on the horizontal points comprises: detecting the one or more planes based on the horizontal points using a random sample consensus (RANSAC) algorithm.

In some embodiments, said determining the one or more walls based on the vertical points comprises: detecting one or more planes based on the vertical points; for each detected plane, calculating a projection-density and a connected-length thereof; for each detected plane, determining the plane as a wall if the calculated projection-density is greater than or equal to a density -threshold and the calculated connected-length is greater than or equal to a length-threshold.

In some embodiments, said detecting the one or more planes based on the vertical points comprises: detecting the one or more planes based on the vertical points using a RANSAC algorithm.

In some embodiments, said for each detected plane, calculating the projection-density and the connected-length thereof comprises calculating the projection-density of the plane by projecting points of the plane onto a predefined horizontal plane, and counting projected points in a local area; and the density -threshold is:

dc^■ h₀

dens_th = — ,

v .

where dc is the radius for point counting, h₀ is the expected minimal height of a wall, p_si is the point sampling interval of the raw point cloud, and "· " represents multiplication.

In some embodiments, said for each detected plane, calculating the projection-density and the connected-length thereof comprises: finding a maximal connective part in the plane with a predefined radius; and determining the connected-length of the plane by calculating the projection length of the maximal connective part on a predefined horizontal plane.

In some embodiments, said determining the intersections of the one or more walls comprises: (1) converting points of the one or more walls into voxels with a predefined size; (2) determining the connectivity of walls by voxel analysis; (3) projecting wall points onto the predefined horizontal plane and extracting the intersection points of connected walls; and (4) adding the extracted intersections as vertices and the linked walls as edges into an intersection graph.

In some embodiments, said aligning the local map with the reference map comprises: combining intersection graphs of the local map with intersection graphs of the reference map by intersection-graph matching; combining ceiling and wall models of the reference map with ceiling and wall models of the local 3D map; and converting the local map to the reference map.

According to one aspect of this disclosure, there is provided a method for updating a reference map of a site. The method comprises: obtaining a local measurement of at least a portion of the site; constructing a local map for the at least one portion of the site using obtained local measurement; determining the location of one or more anchor devices in the local map; aligning the local map with the reference map; determining a geometric transformation from the local map to the reference map; convert the coordinates of said one or more anchor devices in local map to coordinates in the reference map by using the determined geometric transformation; and merging the local map with the reference map.

According to one aspect of this disclosure, there is provided one or more non-transitory computer-readable storage media comprising computer-executable instructions, the instructions, when executed, causing a processor to perform actions comprising: obtaining a local measurement of at least a portion of the site; constructing a local map for the at least one portion of the site using obtained local measurement; determining the location of one or more anchor devices in the local map; aligning the local map with the reference map; determining a geometric transformation from the local map to the reference map; convert the coordinates of said one or more anchor devices in local map to coordinates in the reference map by using the determined geometric transformation; and merging the local map with the reference map.

It should be noted that this Summary is provided to introduce a selection of concepts, in a simplified form, which is elaborated upon below in the detailed description section. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In the following description of device localization and map update technique, embodiments reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the technique may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the technique.

It is also noted that specific terminology will be resorted to in describing the present invention to be limited to the specific terms so chosen. Furthermore, it is to be understood that each specific term comprises all its technical equivalents that operate in a broadly similar manner to achieve a similar purpose. Reference herein to "one embodiment" or an "embodiment" means that a particular feature, structure, or characteristics described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of process flow representing one or more embodiments of the invention do not inherently indicate any particular order nor imply any limitations of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a navigation system, according to some embodiments of this disclosure;

FIG. 2 is a schematic diagram of a movable object in the navigation system shown in FIG. 1;

FIG. 3 is a schematic diagram showing a hardware structure of a computing device of the navigation system shown in FIG. 1;

FIG. 4 is a schematic diagram showing a functional structure of the navigation system shown in FIG. 1 for surveying and map updating;

FIG. 5 is a schematic diagram showing a main processing flow of the system shown in FIG. 1 for surveying and map updating;

Fig. 6 is a schematic diagram showing a device-localization processing flow of the system shown in FIG. 1 ;

FIG. 7 is a flowchart showing a process of reference map construction of the system shown in FIG. 1;

FIG. 8 is a flowchart showing a process of local map construction of the system shown in

FIG. 1;

FIG. 9 is a flowchart showing a computer vision method for detecting LEDs in RGB-D images;

FIG. 10 is a flowchart showing a process of aligning the local map to the reference map; FIG. 11 is a flowchart showing a process of local-to-reference coordinate transformation; and

FIG. 12 is a schematic diagram showing a device-localization processing flow for map update.

DETAILED DESCRIPTION

1. System Overview

Turning now to FIG. 1, a navigation system is shown and is generally identified using reference numeral 100. Herein, the terms "tracking", "positioning", "navigation", "navigating", "localizing", and "localization" may be used interchangeably with a similar meaning of determining at least the position of a movable object in a site. Depending on the context, these terms may also refer determining other navigation parameters of the movable object such as its pose, speed, heading, and/or the like.

The navigation system 100 tracks one or more movable objects 108 in a site 102 such as a building complex. The movable object 108 may be autonomously movable in the site 102 (for example, a robot, a vehicle, an autonomous shopping cart, a wheelchair, a drone, or the like) or may be attached to a user and movable therewith (for example, a specialized tag device, a smartphone, a smart watch, a tablet, a laptop computer, a personal data assistant (PDA), or the like). One or more anchor devices 104 are deployed in the site 102 and are functionally coupled to one or more computing devices 106. The anchor devices 104 may be any devices suitable for facilitating survey sensors (described later) of the movable object 108 to obtain observations that may be used for positioning, tracking, or navigating the movable object 108 in the site 102. For example, the anchor devices 104 in some embodiments may be wireless access points or stations. Depending on the implementation, the wireless access points or stations may be WI-FI^® stations (WI-FI is a registered trademark of Wi-Fi Alliance, Austin, TX, USA), BLUETOOTH^® stations (BLUETOOTH is a registered trademark of Bluetooth Sig. Inc., Kirkland, WA, USA), ZIGBEE^® stations (ZIGBEE is a registered trademark of ZigBee Alliance Corp., San Ramon, CA, USA), cellular base stations, and/or the like. As those skilled in the art will appreciate, the anchor devices 104 may be functionally coupled to the one or more computing devices 106 via suitable wired and/or wireless communication structures 114 such as Ethernet, serial cable, parallel cable, USB cable, HDMI^® cable (HDMI is a registered trademark of HDMI Licensing LLC, San, Jose, CA, USA), WI-FI^®, BLUETOOTH^®, ZIGBEE^®, 3G or 4G or 5G wireless telecommunications, and/or the like.

As shown in FIG. 2, the movable object 108 comprises one or more survey sensors 118 for example, vision sensors such as cameras for object positioning using computer vision technologies, inertial measurement units (IMUs), received signal strength indicators (RSSIs) that measure the strength of received signals (such as BLUETOOTH^® low energy (BLE) signals, cellular signals, WI-FI^® signals, and/or the like), magnetometers, barometers, and/or the like. Some of the survey sensors 118 may collaborate with one or more anchor devices 104 such as in wireless communication with wireless access points or stations, for object positioning. Such wireless communication may be in accordance with any suitable wireless communication standard such as WI-FI^®, BLUETOOTH^®, ZigBee^®, 3G or 4G or 5G wireless telecommunications or the like, and/or may be in any suitable form such as a generic wireless communication signal, a beacon signal, or a broadcast signal. Moreover, the wireless communication signal may be in either a licensed band or an unlicensed band, and may be either a digital-modulated signal or an analog- modulated signal. In some embodiments, the wireless communication signal may be an unmodulated carrier signal. In some embodiments, the wireless communication signal is a signal emanating from a wireless transmitter (being one of the sensors 104 or 118) with an approximately constant time-averaged transmitting power known to a wireless receiver (being the other of the sensors 104 or 118) that measures the RSS thereof.

Those skilled in the art will appreciate that the survey sensors 118 may be selected and combined as desired or necessary, based on the system design parameters such as system requirements, constraints, targets, and the like. For example, in some embodiments, the navigation system 100 may not comprise any barometers. In some other embodiments, the navigation system 100 may not comprise any magnetometers.

Those skilled in the art will appreciate that, although Global Navigation Satellite System (GNSS) receivers such as GPS receivers, GLONASS receivers, Galileo positioning system receivers, Beidou Navigation Satellite System receivers, generally work well under relatively strong signal conditions in most outdoor environments, they usually have high power consumption and high network timing requirements when compared to many infrastructure devices. Therefore, while in some embodiments, the navigation system 100 may comprise GNSS receivers as survey sensors 118, at least in some other embodiments that the navigation system 100 is used for IoT object positioning, the navigation system 100 may not comprise any GNSS receiver.

In embodiments where RSS measurements are used, the RSS measurements may be obtained by the anchor device 104 having RSSI functionalities (such as wireless access points) or by the movable object 108 having RSSI functionalities (such as object having a wireless transceiver). For example, in some embodiments, a movable object 108 may transmit a wireless signal to one or more anchor devices 104. Each anchor device 104 receiving the transmitted wireless signal, measures the RSS thereof and sends the RSS measurements to the computing device 106 for processing. In some other embodiments, a movable object 108 may receive wireless signals from one or more anchor devices 104. The movable object 108 receiving the wireless signals measures the RSS thereof, and sends the RSS observables to the computing device 106 for processing. In yet some other embodiments, some movable objects 108 may transmit wireless signals to anchor devices 104, and some anchor devices 104 may transmit wireless signals to one or more movable objects 108. In these embodiments, the receiving devices, being the anchor devices 104 and movable objects 108 receiving the wireless signals, measure the RSS thereof and send the RSS observables to the computing device 106 for processing.

In some embodiments, the movable objects 108 also send data collected by the survey sensors 118 to the computing device 106.

As the system 100 may use data collected by sensors 104 and 118, the following description does not differentiate the data received from the anchor devices 104 and the data received from the survey sensors 118. Therefore, the anchor devices 104 and the survey sensors 118 may be collectively denoted as sensors 104 and 118 hereinafter for ease of description, and the data collected from sensors 104 and 118 may be collectively denoted as reference sensor data or simply sensor date.

The one or more computing devices 106 may be one or more stand-alone computing devices, servers, or a distributed computer network such as a computer cloud. In some embodiments, one or more computing devices 106 may be portable computing devices such as laptops, tablets, smartphones, and/or the like, integrated with the movable object 108 and movable therewith.

FIG. 3 shows a hardware structure of the computing device 106. As shown, the computing device 106 comprises one or more processing structures 122, a controlling structure 124, a memory 126 (such as one or more storage devices), a networking interface 128, a coordinate input 130, a display output 132, and other input modules and output modules 134 and 136, all functionally interconnected by a system bus 138.

The processing structure 122 may be one or more single-core or multiple-core computing processors such as INTEL^® microprocessors (INTEL is a registered trademark of Intel Corp., Santa Clara, CA, USA), AMD^® microprocessors (AMD is a registered trademark of Advanced Micro Devices Inc., Sunnyvale, CA, USA), ARM^® microprocessors (ARM is a registered trademark of Arm Ltd., Cambridge, UK) manufactured by a variety of manufactures such as Qualcomm of San Diego, California, USA, under the ARM^® architecture, or the like.

The controlling structure 124 comprises a plurality of controllers such as graphic controllers, input/output chipsets, and the like, for coordinating operations of various hardware components and modules of the computing device 106.

The memory 126 comprises a plurality of memory units accessible by the processing structure 122 and the controlling structure 124 for reading and/or storing data, including input data and data generated by the processing structure 122 and the controlling structure 124. The memory 126 may be volatile and/or non-volatile, non-removable or removable memory such as RAM, ROM, EEPROM, solid-state memory, hard disks, CD, DVD, flash memory, or the like. In use, the memory 126 is generally divided to a plurality of portions for different use purposes. For example, a portion of the memory 126 (denoted herein as storage memory) may be used for long- term data storing, for example storing files or databases. Another portion of the memory 126 may be used as the system memory for storing data during processing (denoted herein as working memory).

The networking interface 128 comprises one or more networking modules for connecting to other computing devices or networks through the network 106 by using suitable wired or wireless communication technologies such as Ethernet, WI-FI^®, BLUETOOTH^®, ZIGBEE^®, 3G or 4G or 5G wireless mobile telecommunications technologies, and/or the like. In some embodiments, parallel ports, serial ports, USB connections, optical connections, or the like may also be used for connecting other computing devices or networks although they are usually considered as input/output interfaces for connecting input/output devices.

The display output 132 comprises one or more display modules for displaying images, such as monitors, LCD displays, LED displays, projectors, and the like. The display output 132 may be a physically integrated part of the computing device 106 (for example, the display of a laptop computer or tablet), or may be a display device physically separate from but functionally coupled to other components of the computing device 106 (for example, the monitor of a desktop computer).

The coordinate input 130 comprises one or more input modules for one or more users to input coordinate data from, for example, a touch-sensitive screen, a touch-sensitive whiteboard, a trackball, a computer mouse, a touch-pad, or other human interface devices (HID), and the like. The coordinate input 130 may be a physically integrated part of the computing device 106 (for example, the touch-pad of a laptop computer or the touch-sensitive screen of a tablet), or may be a display device physically separate from but functionally coupled to other components of the computing device 106 (for example, a computer mouse). The coordinate input 130, in some implementations, may be integrated with the display output 132 to form a touch-sensitive screen or a touch-sensitive whiteboard.

The computing device 106 may also comprise other inputs 134 such as keyboards, microphones, scanners, cameras, and the like. The computing device 106 may further comprise other outputs 136 such as speakers, printers and the like.

The system bus 138 interconnects various components 122 to 136 enabling them to transmit and receive data and control signals to/from each other.

Depending on the types of localization sensors 104 and 118 used, the navigation system 100 may be designed for robust indoor/outdoor seamless object positioning, and the processing structure 122 may use various signal-of-opportunities such as BLE signals, cellular signals, WI-FI^®, earth magnetic field, 3D building models, floor maps, point clouds, and/or the like, for object positioning.

In these embodiments, the navigation system 100 uses a reference map of the site 102 stored in a database in the memory 126 to facilitate obj ect positioning and navigation. In particular, the processing structure 122 is functionally coupled to the sensors 104 and 118 and the reference map. The processing structure 122 executes computer-executable code stored in the memory 126 which implements an object positioning and navigation process for collecting sensor data from sensors 104 and 118, and uses the collected sensor data and the reference map for tracking the movable objects 108 in the site 102. The processing structure 122 also uses the collected sensor data to update the reference map.

FIG. 4 shows a functional structure of the navigation system 100 for surveying and map updating. As shown, the system 100 in this aspect comprises a reference map management module 152, a three-dimensional (3D) map registration module 154 and a local data processing module 156. The reference map management module 152 is in charge of maintaining a reference 3D map, and comprises a reference 3D map constructor submodule 162 and a map updater submodule 164. The 3D map registration module 154 is responsible for aligning a local map to the reference map, and comprises a coarse-to-fine map-registration submodule 166, and a local - to-reference coordinate transformer submodule 168. The local data processing module 156 is for local map processing and target detection, and comprises a preprocessor submodule 172, a local 3D map constructor submodule 174, a target detector submodule 176, and a local 3D coordinate transformer submodule 178.

FIG. 5 is a schematic diagram showing a main process 200 of the system 100. As shown, in the reference map database management module 152, the reference 3D map constructor 162 uses captured reference data sources 202 to pre-construct a reference map database 204 of the site 102. The reference map database 204 comprises a reference 3D map which is sent (arrow 208) to the coarse-to-fine map-registration submodule 166 of the 3D map registration module 154 for alignment processing (described later).

After the reference map database 204 is constructed, a user can use one or more suitable consumer devices such as cameras, RGB and depth (RGB-D) sensors, Light Detection and Ranging (LiDAR) devices and/or the like, to collect spatial information (such as local data sources 206) from a target device and its surrounding area. Then, the raw local data 206 is processed by the preprocessor 172 of the local data processing module 156, and the processed data is used by the local 3D map constructor 174 to construct a local 3D map 210 representing the local area, which is sent to the coarse-to-fine map-registration submodule 166 of the 3D map registration module 154 for alignment processing (described later). The target detector 176 then detects the targets from the local 3D map, and the local 3D coordinate transformer 178 applies a geometric transformation to the local 3D map to obtain device local coordinates therein, which is sent (arrow 212) to the local-to-reference coordinate transformer 168 of the 3D map registration module 154 for processing (described later).

In the 3D map registration module 154, the coarse-to-fine map-registration submodule 166 aligns the local 3D map 210 obtained by the local 3D map constructor 174 to the reference 3D map 208 in the reference map database 204, and performs an estimation of the geometric transformation from the local frame to the reference frame. Finally, the determined local-to- reference transformation is used by the local-to-reference coordinate transformer 168 to convert the local map 210 and device's local coordinates 212 to the reference frame. The rectified local map and device coordinates/position is sent to the map updater 164 of the reference map management module 152 and to update the reference map database 204. 2. Device localization

Device localization in the site 102 is a principal application of the system 100. The workflow of device localization is a subset of the workflow of the system 100. FIG. 6 is a schematic diagram showing a device-localization process 240 of the system 100 for device localization in one embodiment. The device-localization process 240 is similar to the process 200 shown in FIG. 5 except that the map updater submodule 164 is not used and the local-to-reference coordinate transformer 168 outputs the device's position in the reference frame without being used for map updating. 2.1. Processing in the Reference map management module 152

In one embodiment, the reference map database is constructed from the georeferenced 3D point cloud. In particular, a 3D point cloud is obtained from the reference data sources 202 for example, being captured by LiDAR, RGB-D camera and other similar equipment. In the reference map constructor 162, the 3D point cloud is rectified by a plurality of control points into a unified frame. Then, a well-designed structure feature detector extracts from the 3D point cloud the geometric structures of the site 102 such as the wall/ceiling models and the intersection graphs thereof. Eventually, the georeferenced 3D point cloud and the extracted geometric features are jointed to construct the reference map database 204.

The wall/ceiling models are a group of models representing individual wall or ceiling plane of the site. Each wall/ceiling model has a set of plane parameters and a cluster of points belonging to the plane.

The intersection graph contains a group of vertices and edges, where each vertex represents an intersection of two joint walls, and each edge represents an individual wall. In one embodiment, the system 100 only requires the 2D coordinates of intersections, and projects all wall points onto the X-Y plane to extract the 2D intersections.

FIG. 7 is a flowchart showing a process 300 of reference map construction. As shown, the geometric features extraction can be accomplished in four steps 302 to 308.

At step 302, point cloud 342 is classified by normals thereof. In particular, step 302 estimates the normal of each point from its neighbors. Many methods may be used for estimating a discrete point's normal, and in this embodiment, an Eigen analysis method such as that taught in reference [11] is employed to determine point's normal robustly. Then, the cross-angle between each point's normal and the vertical direction is calculated. If the calculated cross-angle is greater than a first threshold angle such as 80 degrees, the point is classified as a horizontal point 344. If the calculated cross-angle is smaller than a second threshold angle such as 15 degrees, the point is classified as a vertical point 346. On the other hand, if the calculated cross-angle is between the first and second threshold angles such as between 15 degrees and 80 degrees, the point is considered/classified as an unclassified point. Of course, those skilled in the art will appreciate that, instead of using 80 degrees and 15 degrees as the first and second threshold angles, one may use other suitable angles as the first and second threshold angles.

The horizontal points 344 are processed at step 304 for ceiling detection. In particular, a suitable algorithm such as the efficient random sample consensus (RANSAC) algorithm taught in reference [12] is used to detect planes from previously derived horizontal points 344, and the area of each detected plane is calculated. If the calculated area is smaller than a given area-threshold (such as 5 square-meters (m²)), the area is considered a fake ceiling and is filtered out. If the calculated area is greater than or equal to the area-threshold, the area is determined as a ceiling is used for obtaining ceiling & wall models 348 stored in the reference map database 204.

The vertical points 346 are processed at step 306 for wall detection. In particular, a suitable algorithm such as the above-described efficient RANSAC algorithm is used to detect planes from previously derived vertical points 346. The projection-density and connected-length of each detected plane is then calculated. If the projection-density is less than a density -threshold or the connected-length is less than a given length-threshold (such as 3 meters (m)), the plane is considered a fake wall and is filtered out. If the projection-density is greater than or equal to the density-threshold and the connected-length is greater than or equal to the given length-threshold, the detected plane is considered a wall and is used for obtaining ceiling & wall models 348 stored in the reference map database 204.

The projection-density of each detected plane is calculated by projecting points of the plane onto a predefined horizontal plane such as the X-Y plane and counting projected points in a local area. The density threshold dens_th can be determined as:

dc^■ h₀

dens_th = — , (1)

Vsi

where dc is the radius for point counting, h₀ is the expected minimal height of a wall (for example 0.5m), p_si is the point sampling interval of the raw point cloud, and "·" represents multiplication.

The connected-length is calculated by finding a maximal connective part in a detected plane with a given radius (for example 0.2m) and calculating its projection length on the X-Y plane.

As shown in FIG. 7, the detected ceilings and walls are stored in the reference map database 204 as ceiling and wall models 348. Moreover, the detected walls are also processed at step 308 for intersection detection.

At step 308, all walls points are voxelized (i.e., converted into voxels) with a given size (such as 0.2m). The voxelization of wall points can be achieved in two steps. First, the 3D bounding box of the whole point cloud is partitioned into voxels (cubes). Then, the points are divided into corresponding voxels based on their coordinates.

After voxels are determined, the connectivity of wall models are determined by voxel analysis, which can be achieved in two steps: first, points in each voxel are retrieved. If points belonging to different wall models are found, these wall models are marked as connected. Then the 4-connectivity neighbors, i.e., the left, right, top and bottom voxels which directly connected to the current voxel, are inspected in the same way to determine the connected wall models.

After connectivity determination, all wall points are projected onto the X-Y plane and the intersection points are extracted among each connected wall models. Then, the extracted intersections (vertices) and linked walls (edges) are added into an intersection graph 350, also stored in the reference map database 204.

In order to distinguish the significance of vertices in the intersection graph, a weight of vertex is defined based on the linked edges' length and linked edges' included angle. The weight can be determined by:

where, e; is the i-t connected wall, Len(e ) is the projection length of e_{i 5} max Len_e is the maximum projection length from all connected walls of current intersection, and ^.(e^, e^) is the included angle between and e^. Here, it is considered that sin (e_Q, e^) = 1 (i.e., walls e₀ and e₁ are perpendicular to each other) for ease of description.

As shown in Equation (2), the intersection with long edges and large and even included angles tends to have large weight. Intuitively, intersection with large weight implies a steady geometrical structure in the local area.

2.2. Processing in the local data processing module 156

The local data processing module 156 conducts the local map construction and the target detection. In one embodiment, the RGB-D sensor is used for local data acquisition and the dichromatic LEDs (typically Red and Blue with the size of 1 centimeter (cm) by 1cm) are designed for device labeling.

With reference again to FIG. 6, the workflow of local data processing contains data collecting and data processing is described as follows.

As shown in FIG. 6, the local data sources 206 can be determined by data collecting. In one embodiment, LEDs are attached to the target devices and turned on before data collection. Then the target devices and their surrounding area are measured by an RGB-D sensor. Since the geometric features play an important role in system 100, the distinct geometric structures of the local area surrounding the devices of interest, such ceilings, intersections of walls, beams, columns, and the like, are carefully captured.

The captured data is then processed in four steps in the local data processing module 156, including preprocessing, local map construction, device detection and 2D-3D coordinate transform, by four submodules 172 to 178, respectively.

At the preprocessing step, the raw RGB-D data captured in the local area is preprocessed by the preprocessing submodule 172 to derive a local 3D point cloud and a batch of oriented RGB-D images. Many public SLAM tools may be used. In this embodiment, a common RGB-D mapping toolkit called RTab-map as taught in reference [13] is used for preprocessing. RTab-map executes an optimized SLAM algorithm to build a local 3D scene from raw RGB-D data and outputs three different results, including the local 3D point cloud, RGB-D images, and an auxiliary file recording the position, and orientation of each image.

At the local map construction step, the local 3D map constructor 174 constructs a local map. FIG. 8 is a flowchart showing a process 400 of local map construction executed at this step. As can be seen, the process 400 is similar to the process 300 shown in FIG. 7 and extracts same type of features except that the process 400 is executed on the local point cloud and the extracted features such as ceiling and wall models 348' and intersection graphs 350' are stored in the local 3D map 402. Therefore, methods and parameter settings similar to those used in process 300 are applied to the local point cloud to detect the ceiling & wall models 348' and local intersection graph 350'.

Referring again to FIG. 6, at the device detection step, the target detector 176 detects devices of interest. Since a dichromatic LEDs was attached to the devices of interest, the positions of LEDs can be used to represent the devices' positions with an acceptable precision. At this step, a computer vision method 440 as shown in FIG. 9 is used to detect LEDs in RGB-D images.

First, the bright blobs are detected (step 444) from the RGB-D images 462 by a thresholding method in Red and Blue bands simultaneously. Then, combined filters including the size constraint of blobs on one image and the spatial continuity constraint of blobs on adjacent images are used to eliminate the fake blobs (step 446).

The blob size constraint used in the combined filters is as follows:

The blob size on an image can be calculated by a LED's actual size as: r = R - -^f, (?) a

where r is the expired size of a LED on an image, R is the actual size of the LED, /is the camera focal length, and d is the depth of the detected blob. If the detected blob's size is close to the expired size r in an accepted interval (such as 5 pixels), the detected blob is likely an actual LED.

The spatial continuity constraint used in the combined filters is as follows:

If a LED can be seen in more than two images, the coordinates 464 of the LED on the two images should be abided as follows:

where the subscripts 1 and 2 represent the first and second images, respectively, (¾, v d^⁷, superscript T representing transpose, comprise the pixel coordinates Vi)^T and depth value d of the a blob's center on the i-ih image (/^' = 1 or 2), R is the rotate matrix determined by the orientation of the i-ih image in the local frame, t_t is the position of the i-ih image in the local frame, and C_t is the camera projection matrix of the i-ih image. C_t can be derived from the camera focus length / and the principal point coordinates (c_{i x}, c_{i y}) as follows:

The spatial continuity constraint can be used to predict the LED's position in multiple images. If the blobs are detected in no less than 3 images with the coordinates near the expired position in each image, the blobs likely represent an actual LED.

After the above-described blob detection method extracts LEDs' pixel coordinates 464 from images, a coordinate transformation is used at the 2D-3D coordinate transform step 448 by the local 3D coordinate transformer submodule 178 to convert the 2D pixel coordinates to the local 3D frame. The tr i shown in the following equation:

Where (¾ Yi, Z )^T represent a device's coordinates in the local frame, (u, v, d)^T is the device's pixel coordinates and depth, and C, R and t are the camera projection matrix, the image's rotation matrix and the image's position, respectively, which can be determined from the auxiliary file 466.

2.3. Processing in the 3D map registration module 154

In the 3D map registration module 154, map registration is used to align the local map 402 generated by the process 400 shown in FIG. 8 with the reference map 204 generated by the process 300 shown in FIG. 7. In this embodiment, a coarse-to-fine registering process 500 as shown in FIG. 10 is executed by the coarse-to-fine map-registration submodule 166 for aligning the local map with the reference map. At step 502, a coarse registration by intersection graph matching is first conducted to combine the intersection graph 350 in the reference map database 204 and the intersection graph 350' in the local 3D map 402 with the following steps:

(1) Sort the vertices in local intersection graph (IG) 350^' by their weights;

(2) Select the vertex with the maximal weight and set it as the origin O_localIG of the local

IG 350^' ;

(3) Select the longest edge linked to O_locallG and set the direction of this edge as the principal direction n_p of the local IG 350' ;

(4) Use the coordinates of O_locallG and a given searching radius (such as 10m) to clip the vertices and edges in the reference IG 350, name the clipped subgraph as lG_cUpped

(5) Sort the vertices in lG_cUpped by their weights in a descending order;

(6) For all sorted vertices in IG_cUpped, set the first vertex as a candidate O_candidate ;

(7) The candidate is considered to correspond to O_locallG

(8) Calculate the offset t_offset between O_candidate and O_locallG ;

(9) For all edges linked to O_candidate, select the edge one by one and calculate its direction n_; ;

(10) Compute the included angle Θ between n_; and n_p ;

(1 1) Construct a 2D rigid transformation Rigid_cand by t₀^_set and Θ;

(12) Use Rigid_cand to transform the vertices in local IG 350', and calculate the standard deviation of distances std_diSi between the transformed vertices coordinates in the local IG 350' and their nearest neighbor in lG_cUpped

(13) If std_diSi is less than a given threshold, stop and output the

Rigid_cand,

(14) Otherwise, select another edge and go to (10);

(15) If no more edge can be selected, select another vertex as candidate and go to (8);

(16) If all vertices in lG_clivped have been processed and no stdjdisi less than the given threshold can be found, stop and output the coarse matching failed.

At step 504, fine registration is conducted to combine the ceiling & wall models 348 in the reference map database 204 and the ceiling & wall models 348' in the local 3D map 402 by using the Iterative closest point (ICP) algorithm taught in reference [ 14] with the following steps:

(si) Convert the ceiling & wall points 348' in the local 3D map 402 by the previously derived Rigid_cand;

(s2) Use the ICP algorithm to determine a fine 3D rigid transformation by matching the converted ceiling & wall points with the ceiling & wall points 348 in the reference map 204; and (s3) Output the parameters of the fine 3D transformation.

After map registration, the local-to-reference coordinate transformer 168 converts the local map 210 and device's local coordinates to the reference frame. FIG. 11 is a flowchart showing a process 540 performed by the local-to-reference coordinate transformer 168.

As shown, the local-to-reference coordinate transformer submodule 168 calculates the device's position 546 in the reference frame by using the previously determined transformation parameters 506 (see FIG. 10) and the device's local coordinates 542 using the previously- derived 3D rigid transformation 544 (see step (s2) above) as follows: + t_rigid > (7)

T

where {X_ref, Y_ref> Zref) represent the device's coordinates in the reference frame, (X_u Yi, Z )^T represent the device's coordinates in the local frame, R_ri₉id is the rotate matrix of previously- derived 3D rigid transformation, and t_{rig id} is the translation vector of the 3D rigid transformation.

3. Map update

Map update is another application of the system 100. FIG. 12 is a schematic diagram showing a device-localization process 600 of the system 100 for map update in one embodiment. The device-localization process 600 is similar to the process 200 shown in FIG. 5 except that the target detector 176 and the local 3D coordinate transformer 178 are not used.

The device-localization process 600 is also similar to the process 240 shown in FIG. 6 except that the target detector 176 and the local 3D coordinate transformer 178 are not used, and that the map updater submodule 164 of the reference map management module 152 uses the determined local map to detect the environmental changes and strengthen the reference map progressively.

In particular, the map updater submodule 164 updates the reference map by a sequence of local maps with the following steps:

(1) Changed area detection

For each rectified local map, a comparison is conducted to find the changed area. In one embodiment, a point-to-point comparison is used to find the closest point correspondence in the reference map. Then, the map updater submodule 164 calculates the distance between the point in the rectified local map and the correspondence in the reference map. If the distance is larger than a given threshold, the map updater submodule 164 marks this point as a changed point and records this change with a timestamp in the reference map database. Later, this changed point set can be further identified by suitable object recognition methods.

(2) Unchanged area merging

The unchanged area is determined by excluding the changed area from the local maps. As the unchanged area may introduce redundancy into the reference map, it is valuable to merge the unchanged area of the local maps with those in the reference map to improve the quality of the reference map.

The local maps are collected by various devices and processed by different methods.

Therefore, their position accuracy may vary from one another. To achieve an optimal merging results, a weighted-merging method is used. In one embodiment, the precision values (such as the device's precision, calibration precision, processing precision, and the like) recorded in the description layer of each 3D map are used to qualify the local map. Then, a weight is calculated by the inverse precisions, and the weighted merging is applied to balance the quality of each local map and the reference map.

Although in above description, the system 100 is for determining the 3D position of a device in an indoor environment, in some alternative embodiments, the system 100 may also be used for determining the 3D position of a device in an outdoor environment, or in a site mixed with indoor and outdoor environments.

REFERENCES

[1] Janky, J. M., et al. (2013). Image-based georeferencing, US Patent Publication

No. 2013/0195362.

[2] Sinha, S. N., et al. (2014). Image-based localization, US Patent Publication No. 2014/0015407.

[3] Brubaker, M. A., et al. (2016). "Map-based probabilistic visual self-localization." IEEE Transactions on pattern analysis and machine intelligence 38(4): 652-665.

[4] Durrant-Whyte, H. and T. Bailey (2006). "Simultaneous localization and mapping: part I." IEEE Robotics & Automation Magazine 13(2): 99-110.

[5] Bailey, T. and H. Durrant-Whyte (2006). "Simultaneous localization and mapping (SLAM): Part II." IEEE Robotics & Automation Magazine 13(3): 108-117.

[6] Tsai, G. I, Chiang, K. W., Chu, C. H., Chen, Y. L., El-Sheimy, N., and Habib, A:

THE PERFORMANCE ANALYSIS OF AN INDOOR MOBILE MAPPING SYSTEM WITH RGB-D SENSOR, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XL-1/W4, 183-188, https : //doi. org/10.5194/isprsarchives-XL- 1 -W4- 183 -2015, 2015.

[7] Niichter, Andreas; Borrmann, Dorit; Koch, Philipp; Kiihn, Markus; May, Stefan. A MAN-PORTABLE, IMU-FREE MOBILE MAPPING SYSTEM. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences. 8/19/2015, Vol. 2 Issue 3-W5, pl7-23. 7p.

[8] El-Hakim, S. F. and P. Boulanger (1999). Mobile system for indoor 3-D mapping and creating virtual environments, US Patent No. 6,009,359.

[9] Huang, S., et al. (2008). "Sparse local submap j oining filter for building large-scale maps." IEEE Transactions on Robotics 24(5): 1621-1130.

[10] Aulinas, I, et al. (2009). Independent Local Mapping for Large-Scale SLAM. ECMR.

[11] Mitra, N. J. and A. Nguyen (2003). Estimating surface normals in noisy point cloud data. Proceedings of the nineteenth annual symposium on Computational geometry, ACM.

[12] Schnabel, R., et al. (2007). Efficient RANSAC for point-cloud shape detection.

Computer graphics forum, Wiley Online Library

[13] The homepage of RTab-map project: http://introlab.github.io/rtabmap/

[14] Besl, P. J. and N. D. McKay (1992). "A method for registration of 3-D shapes." IEEE

Transactions on pattern analysis and machine intelligence 14(2): 239-256.

Claims

WHAT IS CLAIMED IS:

1. A system for determining the position of a moving object in a site, the system comprising: one or more anchor devices deployed in the site for providing data sources related to the moving object and the site;

a reference map of the site; and

a processing structure for determining the position of the moving object using the data obtained by the one or more anchor devices and the reference map;

wherein the reference map comprises:

a first layer comprising the data sources; and

a second layer comprising data extracted from the data sources for representing and indexing the primary structures of the site.

2. The system of claim 1 wherein the reference map further comprises a third layer comprising characteristics of the data sources; and wherein said characteristics comprises at least one of a data capturing time, a device type, and a data precision.

3. The system of claim 1 or 2 wherein the processing structure is configured for executing a map-updating process for updating the reference map using images of at least a portion of the site and a point cloud of the site.

4. The system of claim 3 wherein the map-updating process comprises:

obtaining a local measurement of at least a portion of the site;

constructing a local map for the at least one portion of the site using obtained local measurement;

determining the location of one or more anchor devices in the local map;

aligning the local map with the reference map;

determining a geometric transformation from the local map to the reference map;

convert the coordinates of said one or more anchor devices in local map to coordinates in the reference map by using the determined geometric transformation; and

merging the local map with the reference map.

5. The system of claim 4 wherein the reference map further comprises geometric structures of the site; and wherein the geometric structures comprises geometric features extracted from three-dimensional (3D) point cloud of the site.

6. The system of claim 5 wherein the geometric features comprises ceiling and wall models and intersection graphs of the ceiling and wall models.

7. The system of claim 6 wherein the processing structure is configured for executing a map- construction process for constructing a map using a point cloud, the map-construction process comprising:

classifying points of the point cloud into at least horizontal points and vertical points; determining one or more ceilings based on the horizontal points;

determining one or more walls based on the vertical points;

determining intersections of the one or more walls; and

storing the determined one or more ceilings and one or more walls in a database as the ceiling and wall models, and storing the determined intersections of the one or more walls in the database as the intersection graphs of the ceiling and wall models.

8. The system of claim 7 wherein said classifying the points of the point cloud into at least horizontal points and vertical points comprises: for each point of the point cloud,

estimating a normal of the point from the neighbors thereof;

calculating a cross-angle between the estimated normal and a vertical direction;

classifying the point as a horizontal point if the calculated cross-angle is greater than a first threshold angle; and

classifying the point as a vertical point if the calculated cross-angle is smaller than a second threshold angle, the second threshold angle being smaller than the first threshold angle.

9. The system of claim 8 wherein said classifying the points of the point cloud into at least horizontal points and vertical points further comprises: for each point of the point cloud,

classifying the point as an unclassified point if the calculated cross-angle is between the first and second threshold angles.

10. The system of any one of claims 7 to 9 wherein said determining the one or more ceilings based on the horizontal points comprises:

detecting one or more planes based on the horizontal points;

for each detected plane, calculating the area thereof;

for each detected plane, determining the plane as a ceiling if the area thereof is greater than an area-threshold.

11. The system of claim 10 wherein said detecting the one or more planes based on the horizontal points comprises:

detecting the one or more planes based on the horizontal points using a random sample consensus (RANSAC) algorithm.

12. The system of any one of claims 7 to 11 wherein said determining the one or more walls based on the vertical points comprises:

detecting one or more planes based on the vertical points;

for each detected plane, calculating a projection-density and a connected-length thereof; for each detected plane, determining the plane as a wall if the calculated projection-density is greater than or equal to a density -threshold and the calculated connected-length is greater than or equal to a length-threshold.

13. The system of claim 12 wherein said detecting the one or more planes based on the vertical points comprises:

detecting the one or more planes based on the vertical points using a RANSAC algorithm.

14. The system of claim 12 or 13 wherein said for each detected plane, calculating the projection-density and the connected-length thereof comprises calculating the projection-density of the plane by projecting points of the plane onto a predefined horizontal plane, and counting projected points in a local area; and

wherein the density -threshold is:

dc^■ h₀

dens_th = — ,

v .

15. The system of any one of claims 12 to 14 wherein said for each detected plane, calculating the projection-density and the connected-length thereof comprises:

finding a maximal connective part in the plane with a predefined radius; and

determining the connected-length of the plane by calculating the projection length of the maximal connective part on a predefined horizontal plane.

16. The system of any one of claims 7 to 15 wherein said determining the intersections of the one or more walls comprises: (1) converting points of the one or more walls into voxels with a predefined size;

(2) determining the connectivity of walls by voxel analysis;

(3) projecting wall points onto the predefined horizontal plane and extracting the intersection points of connected walls; and

(4) adding the extracted intersections as vertices and the linked walls as edges into an intersection graph.

17. The system of any one of claims 4 to 16 wherein said aligning the local map with the reference map comprises:

combining intersection graphs of the local map with intersection graphs of the reference map by intersection-graph matching;

combining ceiling and wall models of the reference map with ceiling and wall models of the local 3D map; and

converting the local map to the reference map.

18. A method for updating a reference map of a site comprising:

obtaining a local measurement of at least a portion of the site;

determining the location of one or more anchor devices in the local map;

aligning the local map with the reference map;

determining a geometric transformation from the local map to the reference map;

merging the local map with the reference map.

19. The method of claim 18 wherein the reference map further comprises geometric structures of the site; and wherein the geometric structures comprises geometric features extracted from three-dimensional (3D) point cloud of the site.

20. The method of claim 19 wherein the geometric features comprises ceiling and wall models and intersection graphs of the ceiling and wall models.

21. The method of claim 20 wherein the processing structure is configured for executing a map-construction process for constructing a map using a point cloud, the map-construction process comprising:

determining one or more walls based on the vertical points;

determining intersections of the one or more walls; and

22. The method of claim 21 wherein said classifying the points of the point cloud into at least horizontal points and vertical points comprises: for each point of the point cloud,

estimating a normal of the point from the neighbors thereof;

23. The method of claim 22 wherein said classifying the points of the point cloud into at least horizontal points and vertical points further comprises: for each point of the point cloud,

24. The method of any one of claims 21 to 23 wherein said determining the one or more ceilings based on the horizontal points comprises:

detecting one or more planes based on the horizontal points;

for each detected plane, calculating the area thereof;

25. The method of claim 24 wherein said detecting the one or more planes based on the horizontal points comprises:

26. The method of any one of claims 21 to 25 wherein said determining the one or more walls based on the vertical points comprises:

detecting one or more planes based on the vertical points;

27. The method of claim 26 wherein said detecting the one or more planes based on the vertical points comprises:

28. The method of claim 26 or 27 wherein said for each detected plane, calculating the projection-density and the connected-length thereof comprises calculating the projection-density of the plane by projecting points of the plane onto a predefined horizontal plane, and counting projected points in a local area; and

wherein the density -threshold is:

dc^■ h₀

dens_th = — ,

Psi

29. The method of any one of claims 26 to 28 wherein said for each detected plane, calculating the projection-density and the connected-length thereof comprises:

finding a maximal connective part in the plane with a predefined radius; and

30. The method of any one of claims 21 to 29 wherein said determining the intersections of the one or more walls comprises:

(1) converting points of the one or more walls into voxels with a predefined size;

(2) determining the connectivity of walls by voxel analysis;

31. The method of any one of claims 18 to 30 wherein said aligning the local map with the reference map comprises:

converting the local map to the reference map.

32. One or more non-transitory computer-readable storage media comprising computer- executable instructions, the instructions, when executed, causing a processor to perform actions comprising:

obtaining a local measurement of at least a portion of the site;

determining the location of one or more anchor devices in the local map;

aligning the local map with the reference map;

determining a geometric transformation from the local map to the reference map;

merging the local map with the reference map.

33. The one or more non-transitory computer-readable storage media of claim 32 wherein the reference map further comprises geometric structures of the site; and wherein the geometric structures comprises geometric features extracted from three-dimensional (3D) point cloud of the site.

34. The one or more non-transitory computer-readable storage media of claim 33 wherein the geometric features comprises ceiling and wall models and intersection graphs of the ceiling and wall models.

35. The one or more non-transitory computer-readable storage media of claim 34 wherein the processing structure is configured for executing a map-construction process for constructing a map using a point cloud, the map-construction process comprising: classifying points of the point cloud into at least horizontal points and vertical points; determining one or more ceilings based on the horizontal points;

determining one or more walls based on the vertical points;

determining intersections of the one or more walls; and

36. The one or more non-transitory computer-readable storage media of claim 35 wherein said classifying the points of the point cloud into at least horizontal points and vertical points comprises: for each point of the point cloud,

estimating a normal of the point from the neighbors thereof;

37. The one or more non-transitory computer-readable storage media of claim 36 wherein said classifying the points of the point cloud into at least horizontal points and vertical points further comprises: for each point of the point cloud,

38. The one or more non-transitory computer-readable storage media of any one of claims 35 to 37 wherein said determining the one or more ceilings based on the horizontal points comprises: detecting one or more planes based on the horizontal points;

for each detected plane, calculating the area thereof;

39. The one or more non-transitory computer-readable storage media of claim 38 wherein said detecting the one or more planes based on the horizontal points comprises:

40. The one or more non-transitory computer-readable storage media of any one of claims 35 to 39 wherein said determining the one or more walls based on the vertical points comprises: detecting one or more planes based on the vertical points;

41. The one or more non-transitory computer-readable storage media of claim 40 wherein said detecting the one or more planes based on the vertical points comprises:

42. The one or more non-transitory computer-readable storage media of claim 40 or 41 wherein said for each detected plane, calculating the projection-density and the connected-length thereof comprises calculating the projection-density of the plane by projecting points of the plane onto a predefined horizontal plane, and counting projected points in a local area; and

wherein the density -threshold is:

dc^■ h₀

dens_th = — ,

v .

43. The one or more non-transitory computer-readable storage media of any one of claims 40 to 42 wherein said for each detected plane, calculating the projection-density and the connected- length thereof comprises:

finding a maximal connective part in the plane with a predefined radius; and

44. The one or more non-transitory computer-readable storage media of any one of claims 35 to 43 wherein said determining the intersections of the one or more walls comprises:

(2) determining the connectivity of walls by voxel analysis;

45. The one or more non-transitory computer-readable storage media of any one of claims 32 to 44 wherein said aligning the local map with the reference map comprises:

converting the local map to the reference map.