WO2024103708A1 - Positioning method, terminal device, server, and storage medium - Google Patents

Positioning method, terminal device, server, and storage medium Download PDF

Info

Publication number
WO2024103708A1
WO2024103708A1 PCT/CN2023/100387 CN2023100387W WO2024103708A1 WO 2024103708 A1 WO2024103708 A1 WO 2024103708A1 CN 2023100387 W CN2023100387 W CN 2023100387W WO 2024103708 A1 WO2024103708 A1 WO 2024103708A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
cloud map
terminal device
local point
server
Prior art date
Application number
PCT/CN2023/100387
Other languages
French (fr)
Chinese (zh)
Inventor
施文哲
陆平
乔秀全
黄亚坤
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024103708A1 publication Critical patent/WO2024103708A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • AR augmented reality
  • a positioning method comprising:
  • the pose information corresponding to the current video frame image is determined based on the current video frame image and the target local point cloud map.
  • a positioning method comprising:
  • the target local point cloud map is sent to the terminal device.
  • a positioning device comprising:
  • the communication unit is used to obtain a target local point cloud map from a server, where the target local point cloud map is a local point cloud map that is closest to an initial position of a terminal device among multiple local point cloud maps.
  • the processing unit is used to determine the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map after moving into the coverage range of the target local point cloud map.
  • a positioning device comprising:
  • the processing unit is used to determine the initial position of the terminal device.
  • the processing unit is also used to determine, based on the initial position of the terminal device, from multiple local point cloud maps a target local point cloud map that is closest to the initial position of the terminal device.
  • the communication unit is used to send the target local point cloud map to the terminal device.
  • a terminal device comprising: a processor and a memory; the memory stores instructions executable by the processor; when the processor is configured to execute the instructions, the terminal device implements the method provided in the first aspect above.
  • a server comprising: a processor and a memory; the memory stores instructions executable by the processor; when the processor is configured to execute the instructions, the server implements the method provided in the second aspect above.
  • a computer-readable storage medium which stores computer instructions.
  • the computer instructions When the computer instructions are executed on a computer, the computer executes the method provided in the first aspect, or executes the method provided in the second aspect.
  • a computer program product comprising computer instructions.
  • the computer instructions When the computer instructions are executed on a computer, the computer executes the method provided in the first aspect, or executes the method provided in the second aspect.
  • FIG1 is a schematic diagram of the composition of a positioning system according to some embodiments.
  • FIG2 is a functional schematic diagram of a terminal device and a server according to some embodiments.
  • FIG3 is a schematic flow chart of a positioning method according to some embodiments.
  • FIG4 is a schematic diagram of the composition of a positioning device according to some embodiments.
  • FIG5 is a schematic diagram of another positioning device according to some embodiments.
  • FIG6 is a schematic diagram of the structure of a terminal device according to some embodiments.
  • FIG. 7 is a schematic diagram of the structure of a server according to some embodiments.
  • first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Thus, a feature defined as “first” or “second” may explicitly or implicitly include one or more of the features. In the description of the present disclosure, unless otherwise specified, "plurality" means two or more.
  • connection should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral connection.
  • connection For ordinary technicians in this field, the meanings of the above terms in the present disclosure can be understood according to the specific circumstances.
  • connection when describing the pipeline, the "connected” and “connection” used in the present disclosure have the meaning of conduction. The specific meaning needs to be understood in conjunction with the context.
  • words such as “exemplarily” or “for example” are used to indicate examples, illustrations or descriptions. Any embodiment or design described as “exemplarily” or “for example” in the embodiments of the present disclosure should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as “exemplarily” or “for example” is intended to present related concepts in a specific way.
  • Web real-time communications is a real-time communication technology that allows web applications or sites to establish peer-to-peer connections between browsers without the help of an intermediary, enabling the transmission of video streams and/or audio streams or any other data. Therefore, WebRTC enables users to create peer-to-peer data sharing and teleconferencing without installing any plug-ins or third-party software.
  • Structure from motion is an algorithm for 3D reconstruction.
  • SFM extracts key points from multiple 2D images and performs image matching to calculate the image pose of the 2D image. It then performs 3D reconstruction based on the image pose and the 2D coordinates of the key points to obtain the corresponding 3D coordinate points.
  • 3D reconstruction technology refers to an image processing technology that constructs the 3D structure of an object or scene in an image through multiple frames of 2D images.
  • 3D reconstruction technology is usually used in augmented reality (AR), mixed reality (MR), visual positioning, autonomous driving, etc.
  • SIFT Scale-invariant feature transform
  • This description is scale-invariant and can detect key points in the image. It is a local feature descriptor.
  • SIFT features are based on some local appearance points of interest on the object and are independent of the size and rotation of the image. SIFT features have a high tolerance for changes in light, noise, and micro-perspective. The detection rate of partial object occlusion using SIFT feature description is also quite high, and even more than 3 SIFT object features are sufficient to calculate the position and orientation of the object.
  • AR technology is a technology that integrates virtual information with real information. It achieves "enhancement" of the real world by simulating virtual information such as computer-generated text and images and applying them to the real world.
  • AR navigation based on terminal devices is an important application of AR technology.
  • AR navigation locates the user's current position and determines the navigation path.
  • the navigation path is rendered into the real environment through AR, and real-time navigation information is added to the real road conditions to more intuitively guide the user forward, so that the user can get an immersive navigation experience.
  • the positioning of terminal devices in related technologies is completed by the server based on the current video frame image and point cloud map, that is, the terminal device completes the positioning of the terminal device through an external device.
  • the positioning methods for AR navigation resulting in poor positioning accuracy of the terminal device when performing AR navigation in certain scenarios, or resulting in high positioning costs, which limits the scope of application of AR navigation based on terminal devices.
  • the positioning method based on the global positioning system has a higher positioning accuracy when performing AR navigation outdoors.
  • GPS global positioning system
  • indoor buildings will affect the GPS signal strength, resulting in a decrease in positioning accuracy. Therefore, the GPS-based positioning method cannot be applied to indoor AR navigation.
  • Another example is the positioning method based on multi-sensor fusion, which is to perform positioning by fusing data from sensors such as inertial sensors, lidar, and Bluetooth.
  • this positioning method also has high positioning accuracy, when performing AR navigation indoors, indoor buildings will also affect the detection of sensors, thereby affecting the indoor positioning accuracy. The positioning accuracy during indoor AR navigation is poor, and the positioning cost is too high due to the addition of more sensors.
  • the vision-based positioning method is based on the Simultaneous Localization and Mapping (SLAM) framework, and performs positioning through feature extraction, feature matching, pose solving and other main steps.
  • the vision-based positioning method also has high positioning accuracy, but this positioning method has high requirements for computing power, and the computing power of the terminal device is limited. This positioning method cannot be applied to the terminal device, that is, it cannot be applied to the AR navigation based on the terminal device.
  • the present disclosure provides a positioning method, which obtains the target local point cloud map closest to the initial position of the terminal device from the server, and then the terminal device can complete the positioning of the terminal device based on the target local point cloud map and the current video frame image.
  • the local point cloud map is small and has low computing power requirements, so the terminal device with limited computing power can complete the positioning of the terminal device based on the target local point cloud map and the current video frame image, realizing high-precision positioning of the terminal device with limited computing power of the terminal device.
  • FIG1 is a schematic diagram of the composition of a positioning system according to some embodiments.
  • the positioning system may include a terminal device 100 and a server 200.
  • the terminal device 100 and the server 200 may be connected via a wired network or a wireless network.
  • the terminal device 100 may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an AR/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc.
  • VR AR/virtual reality
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • the terminal device 100 may be installed with clients of various application services, such as a browser, an instant messaging tool, etc.
  • the user may operate the client of the application service, and the terminal device 100 interacts with the server 200 through a wired network or a wireless network in response to the user's operation to receive or send information.
  • the browser may be a web browser that supports WebRTC functions and a portable, small, fast-loading, and highly compatible Web assembly (WASM).
  • WASM highly compatible Web assembly
  • server 200 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), as well as big data and artificial intelligence platforms.
  • cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), as well as big data and artificial intelligence platforms.
  • the server 200 can cooperate with the terminal device 100 to enable the user to use various application services.
  • the server 200 can analyze and process the information sent by the terminal device 100 and feed back the processing results to the terminal device 100.
  • the server 200 is a server running a Web browser server
  • the terminal device 100 is a terminal device running a Web browser client
  • the server 200 is a server with a display, and the display of the server is used to display a control interface of the server.
  • the wired network or wireless network may include a router, a switch, a base station, or other devices that facilitate communication between the terminal device 100 and the server 200, which is not limited in the embodiments of the present disclosure.
  • the number of devices included in the positioning system shown in Figure 1 is not limited, for example, the number of terminal devices 100 is not limited, and the number of servers is not limited.
  • the positioning system shown in Figure 1 may also include other devices, which is not limited in this disclosure.
  • FIG. 2 is a functional schematic diagram of a terminal device and a server according to some embodiments. As shown in FIG. 2 , in some embodiments of the present disclosure, the terminal device 100 and the server 200 are respectively used to implement the following functions:
  • the server 200 is used to restore the scale of the multiple frames of environmental images after acquiring the multiple frames of environmental images, and then store them in a database of the server 200 , which may be a database.db database.
  • the server 200 is also used to extract features from multiple frames of environmental images, generate a point cloud map, and store the point cloud map in a 3D point plus descriptor format.
  • the server 200 is also used to implement functions such as outlier removal, point cloud division, point cloud storage structure, point cloud compression, point cloud prediction, and point cloud map storage for the point cloud map.
  • point cloud division is to divide the point cloud map into multiple local point cloud maps based on density or grid.
  • the number of three-dimensional (3D) points and the number of feature descriptors are stored in the point cloud storage structure, wherein the correspondence between 3D points and feature descriptors is one-to-many.
  • Point cloud compression is to remove ceiling and ground points from the point cloud map.
  • Point cloud prediction is to determine the local point cloud map closest to the initial position of the terminal device 100 based on the initial position of the terminal device 100.
  • the terminal device 100 is used to obtain the current video frame image, and send the current video frame image to the server 200 through the Web browser client to obtain the local point cloud map closest to the initial position of the terminal device 100 from the server 200, and store the obtained local point cloud map in the database.
  • Server 200 is a server running a Web browser server, so server 200 is also used for server positioning, that is, the Web browser server completes the positioning of the current position of the terminal device 100 according to the current video frame image sent by the terminal device 100, and stores the initial position of the terminal device 100 in the database.db database.
  • the terminal device 100 is also used to track the real-time position of the user during AR navigation based on the three-axis attitude angle (or angular velocity) and acceleration measured by the inertial measurement unit (IMU) in combination with the current video frame image, the previous video frame image and the local point cloud map.
  • IMU inertial measurement unit
  • FIG3 is a schematic flow chart of a positioning method according to some embodiments. As shown in FIG3 , the present disclosure provides a positioning method, which includes S101 to S105 .
  • the server 200 determines an initial location of a terminal device.
  • the server 200 receives a first request message from the terminal device 100.
  • the first request message is used to request a local point cloud map, and the first request message includes a video frame image used to locate the initial position of the terminal device 100.
  • the server 200 determines the initial position of the terminal device 100 according to the video frame image in the received first request message.
  • the terminal device 100 when the user needs to use AR navigation, he can open the AR navigation function of the terminal device by inputting a link in a web browser installed on the terminal device 100. After receiving the user's input instruction, the terminal device 100 responds to the input instruction and issues a prompt message for prompting the user to use the terminal device 100 to scan the user's surrounding environment.
  • the terminal device 100 After receiving the scanning operation of the user, the terminal device 100 responds to the scanning operation and captures the user's surrounding environment to obtain a video frame image of the user's surrounding environment. After receiving the video frame image, the server 200 sends the first request information to the server 200. The server 200 receives the first request information from the terminal device 100, and determines the initial position of the terminal device 100 according to the video frame image in the received first request information.
  • the server 200 determines, based on the initial position of the terminal device 100 , from a plurality of local point cloud maps, a target local point cloud map that is closest to the initial position of the terminal device 100 .
  • a plurality of local point cloud maps are pre-stored in the memory of the server 200, and each local point cloud map corresponds to a different coverage range.
  • the server 200 can compare the distance between the initial position of the terminal device 100 and the center point of each local point cloud map, and determine the target local point cloud map whose center point is closest to the initial position of the terminal device 100 from the multiple local point cloud maps.
  • the multiple local point cloud maps are obtained by optimizing the multiple initial local point cloud maps, and the multiple initial local point cloud maps are obtained by segmenting the point cloud maps.
  • the image acquisition personnel can use a camera device to collect the multi-frame environment images that need to build AR navigation, and after collecting the multi-frame environment images that need to build AR navigation, upload the multi-frame environment images that need to build AR navigation to the server 200.
  • the camera device may be a terminal device 100, and the terminal device 100 is equipped with (or carries, is equipped with) a panoramic camera, that is, the terminal device 100 and the panoramic camera are an integrated device.
  • the terminal device 100 can use the panoramic camera to collect images of the environment through which the AR navigation passes.
  • the image collector can hold the terminal device 100 to collect images of the environment through which the AR navigation passes, and transmit the collected multi-frame environmental images to the server 200 in real time through the terminal device 100.
  • the camera device may be a panoramic camera, that is, the terminal device 100 and the panoramic camera are two independent physical devices.
  • the terminal device 100 and the panoramic camera are connected by wired connection or wireless connection.
  • the terminal device 100 obtains multiple frames of environmental images taken by the panoramic camera by wired connection or wireless connection, and then transmits the obtained multiple frames of environmental images to the server 200 by wired connection or wireless connection.
  • the wireless connection may be a Bluetooth connection or a wireless network connection, and the present disclosure does not limit the connection method between the terminal device 100 and the panoramic camera.
  • the server 200 can perform three-dimensional reconstruction on the multiple frames of environment images to obtain a point cloud map.
  • the server 200 processes multiple frames of environmental images using a three-dimensional reconstruction method based on a first feature extraction method to determine the pose information and first feature descriptor of each of the multiple three-dimensional points. Afterwards, the server 200 processes the multiple frames of environmental images using a second feature extraction method to determine the second feature descriptor of each of the multiple three-dimensional points. Furthermore, the server 200 constructs a point cloud map based on the pose information and second feature descriptor of each of the multiple three-dimensional points. Among them, the data volume of the second feature descriptor of each three-dimensional point is less than the data volume of the first feature descriptor of the same three-dimensional point.
  • the pose information of a three-dimensional point includes the position and posture of the three-dimensional point in a specified coordinate system.
  • the feature descriptor is a representation of an image or an image block, which simplifies the image by extracting useful information and discarding redundant information.
  • the feature descriptor converts an image of size
  • the image of width ⁇ height ⁇ 3 (number of channels) is converted into a feature vector/array of length n.
  • the server 200 constructs a point cloud map with the posture information and the second feature descriptor of each three-dimensional point among multiple three-dimensional points, and can quickly extract and match features of the feature descriptors of each three-dimensional point, thereby improving the construction speed of the point cloud map, so as to meet the real-time service requirements of AR navigation based on the terminal device 100.
  • the server 200 after constructing the point cloud map, stores the point cloud map in a three-dimensional point plus descriptor format.
  • the first feature extraction method is the SIFT feature extraction method
  • the second feature extraction method is the oriented fast and rotated brief (ORB) feature extraction method.
  • the first feature descriptor may be a SIFT feature descriptor or an AKAZE feature descriptor
  • the second feature descriptor may be an ORB feature descriptor or a SUPERPOINT feature descriptor.
  • the server 200 performs the first 3D reconstruction based on the SIFT features of the multi-frame environment image to obtain the pose information and SIFT feature descriptor of each of the multiple 3D points. Then, the server 200 performs the second 3D reconstruction based on the ORB features of the multi-frame image to obtain the ORB feature descriptor of each of the multiple 3D points. Furthermore, the server 200 constructs a point cloud map based on the pose information and ORB feature descriptor of each of the multiple 3D points.
  • the server 200 takes too long to perform feature extraction and feature matching on the SIFT feature descriptors of three-dimensional points, and cannot meet the real-time service requirements of the AR navigation based on the terminal device 200.
  • the data volume of the ORB feature descriptor of a three-dimensional point is smaller than the data volume of the SIFT feature descriptor of the same three-dimensional point.
  • the server 200 constructs a point cloud map based on the pose information of each three-dimensional point in the multiple three-dimensional points and the ORB feature descriptor, which improves the feature extraction speed and feature matching speed of the feature descriptors of the three-dimensional points, and improves the construction speed of the point cloud map, so that the real-time service requirements of the AR navigation based on the terminal device 100 can be met.
  • the server 200 performs three-dimensional reconstruction based on the SFM method to obtain a point cloud map.
  • the server 200 can also perform three-dimensional reconstruction based on the SLAM technology to obtain a point cloud map.
  • the server 200 can perform three-dimensional reconstruction based on the ORB-SLAM technology to obtain a point cloud map.
  • the server 200 can segment the point cloud map to obtain multiple initial local point cloud maps.
  • the server 200 segments the point cloud map to obtain multiple initial local point cloud maps, which can be implemented as follows: after generating the point cloud map, the server 200 displays the point cloud map on a display to realize visualization of the point cloud map, so that the user can set map points of interest for the point cloud map according to the point cloud map displayed on the display of the server 200. After receiving the user's operation of setting map points of interest for the point cloud map, the server 200 responds to the operation and constructs a geographic fence of a certain size based on each map point of interest, and segments the point cloud map to obtain multiple initial local point cloud maps. Among them, the map points of interest are also the key points in the AR navigation process.
  • the optimization process includes at least one of the following: removing outlier three-dimensional points in the initial local point cloud map; or converting a one-to-many relationship between three-dimensional points and feature descriptors in the initial local point cloud map into a one-to-one relationship.
  • the server 200 can remove the outlier 3D points in the initial local point cloud map.
  • the initial local point cloud map is obtained by processing multiple frames of environmental images. Different frames of environmental images are obtained by shooting the same environment under different conditions. The different conditions include different angles, different times, and different locations. Therefore, a 3D point in the initial local point cloud map can have a corresponding relationship with multiple feature descriptors. However, if a 3D point in the initial local point cloud map has a corresponding relationship with multiple feature descriptors, the feature matching time will be too long, which cannot meet the real-time service requirements of AR navigation based on terminal devices.
  • converting the one-to-many relationship between the three-dimensional points and the feature descriptors in the initial local point cloud map into a one-to-one relationship can be understood as deduplication of the feature descriptors of the initial local point cloud map.
  • the server 200 sends the target local point cloud map to the terminal device 100.
  • the server 200 determines the target local point cloud map that is closest to the initial position of the terminal device 100 based on the initial position of the terminal device 100, it sends the target local point cloud map to the terminal device 100 so that the terminal device 100 can implement the AR navigation function based on the target local point cloud map.
  • the server 200 sends the target local point cloud map in the binary file BIN format to the terminal device 100 through the byte stream.
  • the target local point cloud map in BIN format occupies less storage space of the terminal device 100, which can save the storage space of the terminal device 100 and enable the terminal device 100 to store more local point cloud maps, so that the terminal device 100 can use the stored local point cloud as an offline map when the mobile network is turned off, ensuring the user experience.
  • the terminal device 100 obtains the target local point cloud map from the server 200.
  • the target local point cloud map is the local point cloud map that is closest to the initial position of the terminal device 100 among the multiple local point cloud maps.
  • the terminal device 100 after receiving the user's instruction to turn on the AR navigation function, the terminal device 100 sends a first request message to the server 200 in response to the instruction.
  • the first request message is used to request a target local point cloud map, and the first request message includes a video frame image for locating the initial position.
  • the terminal device 100 receives a first response message from the server 200, and the first response message includes a target local point cloud map.
  • the terminal device 100 after the terminal device 100 receives the target local point cloud map in BIN format from the server 200, the terminal device 100 reads the target local point cloud map in BIN format, determines the geo-fence information of the target local point cloud map, sets the center point according to the geo-fence information of the target local point cloud map, and loads the center point to the Web browser client and stores it in the JavaScript data structure of the Web browser client.
  • the terminal device 100 can perform real-time positioning based on the video stream of the surrounding environment captured in real time, and determine whether to move into the coverage of the target local point cloud map in combination with the coverage of the target local point cloud map.
  • Scenario 1 The server 200 determines whether the object has moved into the coverage of the target local point cloud map.
  • the terminal device 100 can intercept in real time based on WebRTC technology the video stream of the user's surrounding environment shot by the terminal device 100 to obtain a real-time video frame image of the user's surrounding environment, and then send the real-time video frame image of the user's surrounding environment to the server 200, which locates the video frame image of the user's surrounding environment to obtain the current position of the terminal device 100.
  • the server 200 calculates the distance between the current position of the terminal device 100 and the center point of the target local point cloud map in real time. If it is detected that the distance between the current position of the terminal device 100 and the center point of the target local point cloud map is less than or equal to the inner radius of the local point cloud map, it is determined that the terminal device 100 moves into the coverage range of the local point cloud map. After determining that the terminal device 100 moves into the coverage range of the local point cloud map, the terminal device 100 receives a second response message from the server 200, and the second response message is used to indicate that the terminal device 100 moves into the coverage range of the target local point cloud map.
  • Scenario 2 The terminal device 100 determines whether it has moved into the coverage of the target local point cloud map.
  • the terminal device 100 can intercept the video stream of the user's surrounding environment shot by the terminal device 100 in real time based on the WebRTC technology, obtain the real-time video frame image of the user's surrounding environment, and then locate the real-time video frame image of the user's surrounding environment to obtain the current position of the terminal device 100, and then calculate the distance between the current position of the terminal device 100 and the center point of the target local point cloud map in real time. If it is detected that the distance between the current position of the terminal device 100 and the center point of the target local point cloud map is less than or equal to the inner circle radius range of the local point cloud map, it is determined that the terminal device 100 moves into the coverage range of the local point cloud map.
  • the terminal device 100 can determine the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map.
  • the terminal device 100 determines the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map, which can be implemented as X1 to X3.
  • the terminal device 100 may perform ORB feature extraction on the current video frame image to obtain an ORB feature descriptor of the current video frame image.
  • the terminal device 100 may use JsFeat technology to accelerate when performing ORB feature extraction on the current video frame image.
  • the terminal device 100 can perform feature matching on the feature descriptors of the current video frame image and the feature descriptors of the three-dimensional points in the local point cloud map based on the WebAssembly component to determine the matched three-dimensional points.
  • the terminal device 100 after the terminal device 100 determines the matched three-dimensional point, it can solve the posture information of the matched three-dimensional point to obtain the posture information corresponding to the current video frame image.
  • the posture information corresponding to the current video frame image is the posture of the terminal device 100 relative to the real object (world coordinate system) when the current video frame image is shot, and the posture information includes the position and posture of the terminal device 100 in the world coordinate system.
  • the terminal device 100 can also perform posture solution on the posture information of the matched three-dimensional points based on the WebAssembly component to obtain the posture information corresponding to the current video frame image.
  • the terminal device 100 completes the positioning of the terminal device 100 according to the target local point cloud map and the current video frame image obtained from the server 200.
  • the point cloud map is larger than the local point cloud map, and has higher requirements for computing power, and is not suitable for the terminal device 100 with limited computing power.
  • the local point cloud map is smaller than the point cloud map, and has lower requirements for computing power.
  • the terminal device 100 can complete the positioning of the terminal device 100 according to the target local point cloud map and the current video frame image, and realizes the positioning of the terminal device 100 with higher precision using the limited computing power of the terminal device 100.
  • the terminal device 100 uses the local point cloud map and the current video frame image to locate the terminal device 100, and is less affected by indoor buildings, so that the AR navigation based on the terminal device 100 can be applied to indoor AR navigation, which improves the scope of application of the AR navigation based on the terminal device 100.
  • a positioning method provided by an embodiment of the present disclosure can, after determining the posture information corresponding to the current video frame image, generate a navigation trajectory based on the posture information of the video frame images at multiple moments.
  • the navigation trajectory generation process can include A1 to A3.
  • the terminal device 100 obtains a posture sequence.
  • the pose sequence includes pose information corresponding to video frame images at multiple moments.
  • the terminal device 100 when the terminal device 100 turns on the AR navigation function, the terminal device 100 can intercept the video stream of the user's surrounding environment shot by the terminal device 100 over a period of time based on the WebRTC technology, and obtain video frame images of the user's surrounding environment at multiple moments, and then perform S105 processing on the video frame images at multiple moments to obtain the posture information of the video frame images at multiple moments, that is, to obtain a posture sequence.
  • the terminal device 100 performs interpolation processing and filtering processing on the posture sequence to obtain a posture sequence after interpolation processing and filtering processing.
  • the pose sequence can be interpolated and filtered so that the pose sequence after interpolation and filtering has a good degree of smoothness.
  • the terminal device 100 displays the navigation trajectory based on the posture sequence after interpolation and filtering.
  • the terminal device 100 after the terminal device 100 obtains the pose sequence after interpolation and filtering, the terminal device 100 generates a navigation trajectory based on the pose sequence after interpolation and filtering and the target local point cloud map, and displays the navigation trajectory.
  • the terminal device 100 displays the navigation trajectory based on the posture sequence after interpolation and filtering, which can be implemented as follows: the terminal device 100 displays the navigation trajectory through a web browser client based on the posture sequence after interpolation and filtering.
  • the terminal device 100 while the terminal device 100 is displaying a navigation trajectory, the terminal device 100 updates the current position of the terminal device 100 in real time based on the current video frame image in the video stream of the user's surrounding environment captured by the terminal device 100, and updates the navigation trajectory in real time in combination with the local point cloud map and the previous video frame image before the current video frame image to achieve a tracking effect.
  • the terminal device 100 after obtaining the posture sequence, performs interpolation and filtering on the posture sequence, so that the navigation trajectory obtained based on the posture sequence after interpolation and filtering is smoother.
  • the terminal device 100 updates the navigation trajectory in real time according to the previous video frame image and the local point cloud map, thereby achieving a tracking effect for the user, which helps to improve the user experience.
  • the positioning device includes a hardware structure and/or software module corresponding to the execution of each function.
  • the present disclosure can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to exceed the scope of the present disclosure.
  • FIG4 is a schematic diagram of the composition of a positioning device according to some embodiments.
  • the present disclosure provides a positioning device 2000.
  • the positioning device 2000 may be the terminal device 100, or a functional module in the terminal device 100, or any electronic device connected to the terminal device 100, etc., which is not limited by the present disclosure.
  • the positioning device 2000 includes a communication unit 2001 and a processing unit 2002. In some embodiments, the positioning device 2000 may also include a storage unit 2003.
  • the communication unit 2001 is used to obtain a target local point cloud map from the server 200 , where the target local point cloud map is a local point cloud map that is closest to an initial position of the terminal device 100 among multiple local point cloud maps.
  • the processing unit 2002 is used to determine the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map after moving into the coverage area of the target local point cloud map.
  • each local point cloud map is obtained by optimizing an initial local point cloud map; the optimization process includes at least one of the following: removing outlier three-dimensional points in the initial local point cloud map; or, converting a one-to-many relationship between three-dimensional points and feature descriptors in the initial local point cloud map into a one-to-one relationship.
  • the initial local point cloud map is segmented from the point cloud map, and the point cloud map is constructed by: processing multiple frames of environmental images with a three-dimensional reconstruction method based on a first feature extraction method to determine the position information and the first feature descriptor of each of the multiple three-dimensional points; and using a second feature extraction method to extract the position information and the first feature descriptor of each of the three-dimensional points.
  • the communication unit is further used to: send a first request message to the server 200, the first request message is used to request a target local point cloud map, and the first request message includes a video frame image for locating an initial position.
  • the communication unit is used to receive a first response message from the server 200, where the first response message includes a target local point cloud map.
  • the storage unit 2003 is used to store a local point cloud map.
  • the communication unit 2001 is used to receive the target local point cloud map in a binary file format sent by the server 200.
  • the processing unit is used to: extract features from the current video frame image to obtain a feature descriptor of the current video frame image; perform feature matching between the feature descriptor of the current video frame image and the feature descriptor of the three-dimensional points in the local point cloud map to determine the matched three-dimensional points; and obtain the pose information corresponding to the current video frame image based on the pose information of the matched three-dimensional points.
  • the communication unit 2001 is further used to obtain a posture sequence, where the posture sequence includes posture information corresponding to video frame images at multiple moments.
  • the processing unit 2002 is further used to: perform interpolation processing and filtering processing on the posture sequence to obtain the posture sequence after interpolation processing and filtering processing; and display the navigation trajectory based on the posture sequence after interpolation processing and filtering processing.
  • the processing unit 2002 is used to display the navigation trajectory through a World Wide Web web browser client based on the posture sequence after interpolation processing and filtering processing.
  • FIG5 is a schematic diagram of another positioning device according to some embodiments.
  • the present disclosure provides a positioning device 3000.
  • the positioning device 3000 may be the server 200, or a functional module in the server 200, or any electronic device connected to the server 200, etc., which is not limited by the present disclosure.
  • the positioning device 3000 includes a communication unit 3001 and a processing unit 3002. In some embodiments, the positioning device 3000 may also include a storage unit 3003.
  • the processing unit 3002 is used to determine an initial position of the terminal device 100 .
  • the processing unit 3002 is further used to determine, based on the initial position of the terminal device 100 , from a plurality of local point cloud maps, a target local point cloud map that is closest to the initial position of the terminal device 100 .
  • the communication unit 3001 is used to send the target local point cloud map to the terminal device 100.
  • the processing unit 3002 is also used to: segment the point cloud map to obtain multiple initial local point cloud maps; optimize the multiple initial local point cloud maps respectively to obtain multiple local point cloud maps; wherein the optimization processing includes at least one of the following: removing outlier three-dimensional points in the initial local point cloud map; or, converting a one-to-many relationship between the three-dimensional points in the initial local point cloud map and the feature descriptors into a one-to-one relationship.
  • the processing unit 3002 is further used to: process the multiple frames of environmental images using a 3D reconstruction method based on a first feature extraction method to determine the position information and the first feature descriptor of each of the multiple 3D points; process the multiple frames of environmental images using a second feature extraction method to determine the second feature descriptor of each of the multiple 3D points; wherein the amount of data of the second feature descriptor of the 3D point is less than that of the same 3D point.
  • a point cloud map is constructed based on the pose information and the second feature descriptor of each 3D point in the multiple 3D points.
  • the first feature descriptor is a scale-invariant feature transform (SIFT) feature descriptor
  • the second feature descriptor is an ORB feature descriptor
  • the communication unit 3001 is used to receive first request information from the terminal device 100 , where the first request information is used to request a local point cloud map, and the first request information includes a video frame image for locating an initial position of the terminal device 100 .
  • the processing unit 3002 is used to determine the initial position of the terminal device 100 according to the video frame image used to locate the initial position of the terminal device 100.
  • the communication unit 3001 is used to send a target local point cloud map in a binary file format to the terminal device 100 .
  • the above method is applicable to a server running a Web browser server, and there is a communication connection between the Web browser server and a Web browser client running on the terminal device 100.
  • the storage unit 3003 is used to store the point cloud map.
  • the units in FIG. 4 and FIG. 5 may also be referred to as modules.
  • a processing unit may be referred to as a processing module.
  • FIG. 4 and FIG. 5 are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present disclosure is essentially or the part that contributes to the relevant technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) or a processor (processor) to perform all or part of the steps of the method described in each embodiment of the present disclosure.
  • the storage medium for storing computer software products includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
  • FIG6 is a schematic diagram of the structure of a terminal device according to some embodiments.
  • the present disclosure provides a terminal device, terminal device 4000, which may be the above-mentioned positioning device 2000.
  • the terminal device 4000 includes: a processor 4002, a communication interface 4003, and a bus 4004.
  • the terminal device 4000 may also include a memory 4001.
  • FIG7 is a schematic diagram of the structure of a server according to some embodiments.
  • the server 5000 may be the above-mentioned positioning device 3000.
  • the server 5000 includes: a processor 5002, a communication interface 5003, and a bus 5004.
  • the server 5000 may also include a memory 5001.
  • processor 4002 and processor 5002 can both implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the present disclosure.
  • Processor 4002 and processor 5002 can both be central processing units, general-purpose processors, digital signal processors, application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • Processor 4002 and processor 5002 can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the present disclosure.
  • the processor 5002 can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of DSPs and microprocessors, etc.
  • Communication interface 4003 and communication interface 5003 are both used to connect to other devices through a communication network.
  • Communication network 4003 and communication interface 5003 can be Ethernet, wireless access network, wireless local area network (WLAN), etc.
  • Memory 4001 and memory 5001 may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or electrically erasable programmable read-only memory (EEPROM), disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but are not limited to these.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • disk storage media or other magnetic storage devices or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but are not limited to these.
  • the memory 4001 may exist independently of the processor 4002, and the memory 4001 may be connected to the processor 4002 via a bus 4004 for storing instructions or program codes.
  • the positioning method provided in the embodiment of the present disclosure can be implemented.
  • the memory 5001 may exist independently of the processor 5002, and the memory 5001 may be connected to the processor 5002 via a bus 5004 for storing instructions or program codes.
  • the processor 5002 calls and executes the instructions or program codes stored in the memory 5001, the positioning method provided in the embodiment of the present disclosure can be implemented.
  • the memory 4001 may be integrated with the processor 4002
  • the memory 5001 may be integrated with the processor 5002 .
  • Bus 4004 and bus 5004 may be extended industry standard architecture (EISA) buses, etc.
  • Bus 4004 and bus 5004 may be divided into address buses, data buses, control buses, etc.
  • EISA industry standard architecture
  • Bus 4004 and bus 5004 may be divided into address buses, data buses, control buses, etc.
  • Only one thick line is used in both FIG6 and FIG7 , but this does not mean that there is only one bus or one type of bus.
  • the disclosed embodiment also provides a computer-readable storage medium. All or part of the processes in the above method embodiments can be completed by computer instructions to instruct the relevant hardware, and the program can be stored in the above computer-readable storage medium. When the program is executed, it can include the processes of the above method embodiments.
  • the computer-readable storage medium can be the memory or memory of any of the above embodiments.
  • the above computer-readable storage medium can also be an external storage device of the above positioning device, such as a plug-in hard disk, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, a flash card (flash card), etc. equipped on the above positioning device.
  • the above computer-readable storage medium can also include both the internal storage unit of the above positioning device and an external storage device.
  • the above computer-readable storage medium is used to store the above computer program and other programs and data required by the above positioning device.
  • the above computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
  • the embodiments of the present disclosure also provide a computer program product, which includes a computer program.
  • the computer program product When the computer program product is run on a computer, the computer is enabled to execute any one of the positioning methods provided in the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Provided is a positioning method. The method comprises: obtaining a target local point cloud map from a server, wherein the target local point cloud map is a local point cloud map closest to an initial position of a terminal device in a plurality of local point cloud maps; and after moving into the coverage of the target local point cloud map, on the basis of a current video frame image and the target local point cloud map, determining pose information corresponding to the current video frame image.

Description

定位方法、终端设备、服务器及存储介质Positioning method, terminal device, server and storage medium
本公开要求申请号为202211429870.1、2022年11月15日提交的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This disclosure claims priority to Chinese patent application No. 202211429870.1, filed on November 15, 2022, the entire contents of which are incorporated by reference into this application.
技术领域Technical Field
本公开涉及通信技术领域,尤其涉及一种定位方法、终端设备、服务器及存储介质。The present disclosure relates to the field of communication technology, and in particular to a positioning method, terminal equipment, server and storage medium.
背景技术Background technique
近年来,随着计算机技术飞速发展,运行在终端设备上的增强现实(augmented reality,AR)技术应运而生,AR技术具有跨平台、普适性强、易传播的优势。In recent years, with the rapid development of computer technology, augmented reality (AR) technology running on terminal devices has emerged. AR technology has the advantages of cross-platform, strong universality and easy dissemination.
发明内容Summary of the invention
第一方面,提供一种定位方法,该方法包括:In a first aspect, a positioning method is provided, the method comprising:
从服务器获取目标局部点云地图,目标局部点云地图为多个局部点云地图中与终端设备的初始位置距离最近的局部点云地图;Obtain a target local point cloud map from the server, where the target local point cloud map is a local point cloud map that is closest to the initial position of the terminal device among the multiple local point cloud maps;
在移动入目标局部点云地图的覆盖范围之后,基于当前视频帧图像和目标局部点云地图,确定当前视频帧图像对应的位姿信息。After moving into the coverage of the target local point cloud map, the pose information corresponding to the current video frame image is determined based on the current video frame image and the target local point cloud map.
第二方面,提供一种定位方法,该方法包括:In a second aspect, a positioning method is provided, the method comprising:
确定终端设备的初始位置;Determine the initial location of the terminal device;
根据终端设备的初始位置,从多个局部点云地图中确定与终端设备的初始位置距离最近的目标局部点云地图;According to the initial position of the terminal device, determining a target local point cloud map which is closest to the initial position of the terminal device from the multiple local point cloud maps;
向终端设备发送所述目标局部点云地图。The target local point cloud map is sent to the terminal device.
第三方面,提供一种定位装置,该装置包括:In a third aspect, a positioning device is provided, the device comprising:
通信单元,用于从服务器获取目标局部点云地图,目标局部点云地图为多个局部点云地图中与终端设备的初始位置距离最近的局部点云地图。The communication unit is used to obtain a target local point cloud map from a server, where the target local point cloud map is a local point cloud map that is closest to an initial position of a terminal device among multiple local point cloud maps.
处理单元,用于在移动入目标局部点云地图的覆盖范围之后,基于当前视频帧图像和目标局部点云地图,确定当前视频帧图像对应的位姿信息。The processing unit is used to determine the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map after moving into the coverage range of the target local point cloud map.
第四方面,提供一种定位装置,该装置包括:In a fourth aspect, a positioning device is provided, the device comprising:
处理单元,用于确定终端设备的初始位置。The processing unit is used to determine the initial position of the terminal device.
处理单元,还用于根据终端设备的初始位置,从多个局部点云地图中确定与终端设备的初始位置距离最近的目标局部点云地图。The processing unit is also used to determine, based on the initial position of the terminal device, from multiple local point cloud maps a target local point cloud map that is closest to the initial position of the terminal device.
通信单元,用于向终端设备发送目标局部点云地图。The communication unit is used to send the target local point cloud map to the terminal device.
第五方面,提供一种终端设备,包括:处理器和存储器;存储器存储有处理器可执行的指令;处理器被配置为执行指令时,使得终端设备实现如上述第一方面所提供的方法。In a fifth aspect, a terminal device is provided, comprising: a processor and a memory; the memory stores instructions executable by the processor; when the processor is configured to execute the instructions, the terminal device implements the method provided in the first aspect above.
第六方面,提供一种服务器,包括:处理器和存储器;存储器存储有处理器可执行的指令;处理器被配置为执行指令时,使得服务器实现如上述第二方面所提供的方法。In a sixth aspect, a server is provided, comprising: a processor and a memory; the memory stores instructions executable by the processor; when the processor is configured to execute the instructions, the server implements the method provided in the second aspect above.
第七方面,提供一种计算机可读存储介质,计算机可读存储介质存储计算机指令,当该计算机指令在计算机上运行时,使得计算机执行第一方面所提供的方法,或者,执行第二方面所提供的方法。 In a seventh aspect, a computer-readable storage medium is provided, which stores computer instructions. When the computer instructions are executed on a computer, the computer executes the method provided in the first aspect, or executes the method provided in the second aspect.
第八方面,提供一种包含计算机指令的计算机程序产品,当该计算机指令在计算机上运行时,使得计算机执行第一方面所提供的方法,或者,执行第二方面所提供的方法。In an eighth aspect, a computer program product comprising computer instructions is provided. When the computer instructions are executed on a computer, the computer executes the method provided in the first aspect, or executes the method provided in the second aspect.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
附图用来提供对本公开技术方案的进一步理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。The accompanying drawings are used to provide further understanding of the technical solution of the present disclosure and constitute a part of the specification. Together with the embodiments of the present disclosure, they are used to explain the technical solution of the present disclosure and do not constitute a limitation on the technical solution of the present disclosure.
图1为根据一些实施例的一种定位系统的组成示意图;FIG1 is a schematic diagram of the composition of a positioning system according to some embodiments;
图2为根据一些实施例的一种终端设备和服务器的功能示意图;FIG2 is a functional schematic diagram of a terminal device and a server according to some embodiments;
图3为根据一些实施例的一种定位方法的流程示意图;FIG3 is a schematic flow chart of a positioning method according to some embodiments;
图4为根据一些实施例的一种定位装置的组成示意图;FIG4 is a schematic diagram of the composition of a positioning device according to some embodiments;
图5为根据一些实施例的另一种定位装置的组成示意图;FIG5 is a schematic diagram of another positioning device according to some embodiments;
图6为根据一些实施例的一种终端设备的结构示意图;FIG6 is a schematic diagram of the structure of a terminal device according to some embodiments;
图7为根据一些实施例的一种服务器的结构示意图。FIG. 7 is a schematic diagram of the structure of a server according to some embodiments.
具体实施方式Detailed ways
为使本领域的技术人员更好地理解本公开实施例的技术方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to enable those skilled in the art to better understand the technical solutions of the embodiments of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in the field without creative work are within the scope of protection of the present disclosure.
需要说明,本公开实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变,则方向性指示也相应地随之改变。It should be noted that all directional indications in the embodiments of the present disclosure (such as up, down, left, right, front, back, etc.) are only used to explain the relative position relationship, movement status, etc. between the components under a certain specific posture (as shown in the accompanying drawings). If the specific posture changes, the directional indication will also change accordingly.
术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开的描述中,除非另有说明,“多个”的含义是两个或两个以上。The terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the features. In the description of the present disclosure, unless otherwise specified, "plurality" means two or more.
在本公开的描述中,除非另有说明,“/”表示“或”的意思,例如,A/B可以表示A或B。术语“和/或”包括一个或多个相关列举条目的任何和所有组合。例如,A和/或B包括仅A、仅B以及A和B。In the description of the present disclosure, unless otherwise specified, "/" means "or", for example, A/B can mean A or B. The term "and/or" includes any and all combinations of one or more related listed items. For example, A and/or B includes only A, only B, and A and B.
在本公开的描述中,需要说明的是,除非另有明确的规定和限定,术语“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本公开中的含义。另外,在对管线进行描述时,本公开中所用“相连”、“连接”则具有进行导通的意义。具体意义需结合上下文进行理解。In the description of the present disclosure, it should be noted that, unless otherwise clearly specified and limited, the terms "connected" and "connection" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral connection. For ordinary technicians in this field, the meanings of the above terms in the present disclosure can be understood according to the specific circumstances. In addition, when describing the pipeline, the "connected" and "connection" used in the present disclosure have the meaning of conduction. The specific meaning needs to be understood in conjunction with the context.
在本公开实施例中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本公开实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。 In the embodiments of the present disclosure, words such as "exemplarily" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of the present disclosure should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplarily" or "for example" is intended to present related concepts in a specific way.
为了使本领域的技术人员便于理解本公开的技术方案,下面先对本公开涉及的术语进行简单介绍。In order to facilitate those skilled in the art to understand the technical solution of the present disclosure, the terms involved in the present disclosure are briefly introduced below.
(1)网页实时通信(1) Real-time communication on web pages
网页实时通信(Web real-time communications,WebRTC)是一项实时通讯技术,WebRTC允许网络应用或者站点,在不借助中间媒介的情况下,建立浏览器之间点对点(Peer-to-Peer)的连接,实现视频流和/或音频流或者其他任意数据的传输。因此,通过WebRTC可以使用户在无需安装任何插件或者第三方的软件的情况下,创建点对点的数据分享和电话会议成为可能。Web real-time communications (WebRTC) is a real-time communication technology that allows web applications or sites to establish peer-to-peer connections between browsers without the help of an intermediary, enabling the transmission of video streams and/or audio streams or any other data. Therefore, WebRTC enables users to create peer-to-peer data sharing and teleconferencing without installing any plug-ins or third-party software.
(2)运动恢复结构(2) Structure of motion recovery
运动恢复结构(structure from motion,SFM)是一种用于三维重建的算法。SFM通过提取多帧二维图像中的关键点并进行图像匹配,进而计算二维图像的图像位姿,根据图像位姿以及关键点的二维坐标进行三维重建,得到对应的三维坐标点。Structure from motion (SFM) is an algorithm for 3D reconstruction. SFM extracts key points from multiple 2D images and performs image matching to calculate the image pose of the 2D image. It then performs 3D reconstruction based on the image pose and the 2D coordinates of the key points to obtain the corresponding 3D coordinate points.
三维重建技术是指通过多帧二维图像构建出图像中的物体或者场景的三维结构的图像处理技术。三维重建技术通常应用于增强现实(augmented reality,AR)、混合现实(mixed reality,MR)、视觉定位、自动驾驶等方面。3D reconstruction technology refers to an image processing technology that constructs the 3D structure of an object or scene in an image through multiple frames of 2D images. 3D reconstruction technology is usually used in augmented reality (AR), mixed reality (MR), visual positioning, autonomous driving, etc.
(3)尺度不变特征变换(3) Scale-invariant feature transformation
尺度不变特征变换(scale-invariant feature transform,SIFT),是用于图像处理领域的一种描述。这种描述具有尺度不变性,可在图像中检测出关键点,是一种局部特征描述子。SIFT特征是基于物体上的一些局部外观的兴趣点,与影像的大小和旋转无关。SIFT特征对于光线、噪声、微视角改变具有较高的容忍度。使用SIFT特征描述对于部分物体遮蔽的侦测率也相当高,甚至只需要3个以上的SIFT物体特征就足以计算出该物体的位置与方位。Scale-invariant feature transform (SIFT) is a description used in the field of image processing. This description is scale-invariant and can detect key points in the image. It is a local feature descriptor. SIFT features are based on some local appearance points of interest on the object and are independent of the size and rotation of the image. SIFT features have a high tolerance for changes in light, noise, and micro-perspective. The detection rate of partial object occlusion using SIFT feature description is also quite high, and even more than 3 SIFT object features are sufficient to calculate the position and orientation of the object.
(4)AR技术(4) AR technology
AR技术是一种将虚拟信息与真实信息进行融合的技术,通过将计算机生成的文字和图像等虚拟信息模拟仿真后应用到真实世界中,从而实现对真实世界的“增强”。AR technology is a technology that integrates virtual information with real information. It achieves "enhancement" of the real world by simulating virtual information such as computer-generated text and images and applying them to the real world.
基于终端设备的AR导航是AR技术的一个重要应用。AR导航即对用户的当前位置进行定位并确定导航路径,通过AR将导航路径渲染到真实环境中,在真实的路况信息中加入实时导航信息来更直观地引导用户前进,使用户获得沉浸式的导航体验。由于AR导航计算密集型的需求以及终端设备算力的有限性,相关技术中对于终端设备的定位是服务器基于当前视频帧图像和点云地图来完成对于终端设备的定位,即终端设备通过外部设备完成对于终端设备的定位,目前针对AR导航的定位方式都存在一定的局限性,导致终端设备对于某些场景下进行AR导航时的定位精度差,或者导致定位成本高,使得基于终端设备的AR导航适用范围受限。AR navigation based on terminal devices is an important application of AR technology. AR navigation locates the user's current position and determines the navigation path. The navigation path is rendered into the real environment through AR, and real-time navigation information is added to the real road conditions to more intuitively guide the user forward, so that the user can get an immersive navigation experience. Due to the computationally intensive requirements of AR navigation and the limited computing power of terminal devices, the positioning of terminal devices in related technologies is completed by the server based on the current video frame image and point cloud map, that is, the terminal device completes the positioning of the terminal device through an external device. At present, there are certain limitations in the positioning methods for AR navigation, resulting in poor positioning accuracy of the terminal device when performing AR navigation in certain scenarios, or resulting in high positioning costs, which limits the scope of application of AR navigation based on terminal devices.
例如,基于全球定位系统(global positioning system,GPS)的定位方式,在室外进行AR导航时基于GPS的定位方式具有较高的定位精度,然而在室内进行AR导航时,室内建筑物会影响到GPS的信号强度,导致定位精度下降,因此,基于GPS的定位方式无法应用在室内AR导航中。For example, the positioning method based on the global positioning system (GPS) has a higher positioning accuracy when performing AR navigation outdoors. However, when performing AR navigation indoors, indoor buildings will affect the GPS signal strength, resulting in a decrease in positioning accuracy. Therefore, the GPS-based positioning method cannot be applied to indoor AR navigation.
再例如,基于多传感器融合的定位方式,即通过融合惯性传感器、激光雷达和蓝牙等传感器的数据进行定位,这种定位方式虽然也具有较高的定位精度,但是在室内进行AR导航时,室内建筑物同样会对传感器的检测造成影响,从而影响了室 内AR导航时的定位精度,导致在室内AR导航时的定位精度差,且由于增添了更多的传感器,导致定位成本过高。Another example is the positioning method based on multi-sensor fusion, which is to perform positioning by fusing data from sensors such as inertial sensors, lidar, and Bluetooth. Although this positioning method also has high positioning accuracy, when performing AR navigation indoors, indoor buildings will also affect the detection of sensors, thereby affecting the indoor positioning accuracy. The positioning accuracy during indoor AR navigation is poor, and the positioning cost is too high due to the addition of more sensors.
又例如,基于视觉的定位方式,即以即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)框架作为基础,通过特征提取,特征匹配,位姿求解等主要步骤进行定位。基于视觉的定位方式也具有较高的定位精度,但是该定位方式对于算力要求高,而终端设备算力有限,该定位方式无法适用于终端设备,也即无法适用于基于终端设备的AR导航中。For example, the vision-based positioning method is based on the Simultaneous Localization and Mapping (SLAM) framework, and performs positioning through feature extraction, feature matching, pose solving and other main steps. The vision-based positioning method also has high positioning accuracy, but this positioning method has high requirements for computing power, and the computing power of the terminal device is limited. This positioning method cannot be applied to the terminal device, that is, it cannot be applied to the AR navigation based on the terminal device.
综上,如何以终端设备有限的算力实现对终端设备的较高精度的定位是亟待解决的问题。In summary, how to achieve high-precision positioning of terminal devices with the limited computing power of terminal devices is an urgent problem to be solved.
为解决上述问题,本公开提供一种定位方法,通过从服务器获取与终端设备的初始位置距离最近的目标局部点云地图,进而终端设备可以根据目标局部点云地图和当前视频帧图像完成对于终端设备的定位。可以理解的是,局部点云地图较小,对于算力要求较低,故算力有限的终端设备可以基于目标局部点云地图和当前视频帧图像完成对于终端设备的定位,实现了以终端设备有限的算力实现对终端设备的较高精度的定位。In order to solve the above problems, the present disclosure provides a positioning method, which obtains the target local point cloud map closest to the initial position of the terminal device from the server, and then the terminal device can complete the positioning of the terminal device based on the target local point cloud map and the current video frame image. It can be understood that the local point cloud map is small and has low computing power requirements, so the terminal device with limited computing power can complete the positioning of the terminal device based on the target local point cloud map and the current video frame image, realizing high-precision positioning of the terminal device with limited computing power of the terminal device.
下面将结合本公开实施例中的附图,对本公开实施例的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例,基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of the present disclosure.
图1为根据一些实施例的一种定位系统的组成示意图,如图1所示,本公开提供一种定位系统,定位系统可以包括终端设备100和服务器200,终端设备100和服务器200之间可以通过有线网络或无线网络连接。FIG1 is a schematic diagram of the composition of a positioning system according to some embodiments. As shown in FIG1 , the present disclosure provides a positioning system. The positioning system may include a terminal device 100 and a server 200. The terminal device 100 and the server 200 may be connected via a wired network or a wireless network.
在一些实施例中,终端设备100可以是手机、平板电脑、可穿戴设备、车载设备、AR/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等。本公开实施例对终端设备100的种类不作限制。In some embodiments, the terminal device 100 may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an AR/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc. The disclosed embodiments do not limit the type of the terminal device 100.
在一些实施例中,终端设备100可以安装有各种应用服务的客户端,例如安装有浏览器、即时通信工具的客户端等。用户可以对应用服务的客户端进行操作,终端设备100响应于用户的操作,通过有线网络或无线网络与服务器200进行交互,以接收或发送信息。其中,浏览器可以是支持WebRTC功能以及可移植、体积小、加载快、兼容性强的Web组件(Webassembly,WASM)的网页浏览器。In some embodiments, the terminal device 100 may be installed with clients of various application services, such as a browser, an instant messaging tool, etc. The user may operate the client of the application service, and the terminal device 100 interacts with the server 200 through a wired network or a wireless network in response to the user's operation to receive or send information. Among them, the browser may be a web browser that supports WebRTC functions and a portable, small, fast-loading, and highly compatible Web assembly (WASM).
在一些实施例中,服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。In some embodiments, server 200 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), as well as big data and artificial intelligence platforms.
服务器200可以和终端设备100相互配合,以实现用户对各种应用服务的使用。例如,服务器200可以对终端设备100发送的信息进行分析处理,并将处理结果反馈给终端设备100。The server 200 can cooperate with the terminal device 100 to enable the user to use various application services. For example, the server 200 can analyze and process the information sent by the terminal device 100 and feed back the processing results to the terminal device 100.
在一些实施例中,服务器200为运行Web浏览器服务端的服务器,终端设备100为运行Web浏览器客户端的终端设备,Web浏览器服务端与Web浏览器客户端之间存在通信连接。 In some embodiments, the server 200 is a server running a Web browser server, the terminal device 100 is a terminal device running a Web browser client, and there is a communication connection between the Web browser server and the Web browser client.
在一些实施例中,服务器200为具有显示器的服务器,服务器的显示器用于显示服务器的控制界面。In some embodiments, the server 200 is a server with a display, and the display of the server is used to display a control interface of the server.
在一些实施例中,有线网络或无线网络可以包括路由器、交换器、基站、或者促进终端设备100与服务器200之间通信的其他设备,本公开实施例对此不作限制。In some embodiments, the wired network or wireless network may include a router, a switch, a base station, or other devices that facilitate communication between the terminal device 100 and the server 200, which is not limited in the embodiments of the present disclosure.
应理解,图1所示的定位系统包括的设备的数量不受限制,例如终端设备100的数量不受限制,且服务器的数量不受限制。除图1所示的设备外,图1示出的定位系统还可以包括其他设备,本公开对此不予限定。It should be understood that the number of devices included in the positioning system shown in Figure 1 is not limited, for example, the number of terminal devices 100 is not limited, and the number of servers is not limited. In addition to the devices shown in Figure 1, the positioning system shown in Figure 1 may also include other devices, which is not limited in this disclosure.
图2为根据一些实施例的一种终端设备和服务器的功能示意图,如图2所示,在本公开一些实施例中,终端设备100与服务器200分别用于实现以下功能:FIG. 2 is a functional schematic diagram of a terminal device and a server according to some embodiments. As shown in FIG. 2 , in some embodiments of the present disclosure, the terminal device 100 and the server 200 are respectively used to implement the following functions:
服务器200用于在获取到多帧环境图像后,对多帧环境图像进行尺度恢复,进而存储在服务器200的数据库中,数据库可以是database.db数据库。The server 200 is used to restore the scale of the multiple frames of environmental images after acquiring the multiple frames of environmental images, and then store them in a database of the server 200 , which may be a database.db database.
服务器200还用于对多帧环境图像进行特征提取,生成点云地图,并将点云地图存储为3D点加描述符格式。The server 200 is also used to extract features from multiple frames of environmental images, generate a point cloud map, and store the point cloud map in a 3D point plus descriptor format.
服务器200还用于实现对点云地图进行离群点剔除、点云划分、点云存储结构、点云压缩、点云预测、点云地图存储等功能。其中,点云划分为基于密度或基于网格将点云地图分割为多个局部点云地图。点云存储结构中存储三维(3D)点的个数和特征描述子的个数,其中,3D点与特征描述子的对应关系为一对多。点云压缩为对点云地图进行天花板地面点去除处理。点云预测为根据终端设备100的初始位置确定距离终端设备100的初始位置最近的局部点云地图。终端设备100用于获取当前视频帧图像,并通过Web浏览器客户端向服务器200发送当前视频帧图像,以从服务器200获取与终端设备100的初始位置距离最近的局部点云地图,并将获取到的局部点云地图存储至数据库中。The server 200 is also used to implement functions such as outlier removal, point cloud division, point cloud storage structure, point cloud compression, point cloud prediction, and point cloud map storage for the point cloud map. Among them, point cloud division is to divide the point cloud map into multiple local point cloud maps based on density or grid. The number of three-dimensional (3D) points and the number of feature descriptors are stored in the point cloud storage structure, wherein the correspondence between 3D points and feature descriptors is one-to-many. Point cloud compression is to remove ceiling and ground points from the point cloud map. Point cloud prediction is to determine the local point cloud map closest to the initial position of the terminal device 100 based on the initial position of the terminal device 100. The terminal device 100 is used to obtain the current video frame image, and send the current video frame image to the server 200 through the Web browser client to obtain the local point cloud map closest to the initial position of the terminal device 100 from the server 200, and store the obtained local point cloud map in the database.
服务器200为运行Web浏览器服务端的服务器,所以服务器200还用于服务端定位,即Web浏览器服务端根据终端设备100发的当前视频帧图像,完成对于终端设备100的当前位置的定位,并将终端设备100的初始位置存储于database.db数据库中。Server 200 is a server running a Web browser server, so server 200 is also used for server positioning, that is, the Web browser server completes the positioning of the current position of the terminal device 100 according to the current video frame image sent by the terminal device 100, and stores the initial position of the terminal device 100 in the database.db database.
终端设备100还用于在AR导航过程中,根据惯性测量单元(inertial measurement unit,IMU)测量得到的三轴姿态角(或角速率)以及加速度,结合当前视频帧图像、上一个视频帧图像和局部点云地图,实现对于用户实时位置的跟踪。The terminal device 100 is also used to track the real-time position of the user during AR navigation based on the three-axis attitude angle (or angular velocity) and acceleration measured by the inertial measurement unit (IMU) in combination with the current video frame image, the previous video frame image and the local point cloud map.
图3为根据一些实施例的一种定位方法的流程示意图,如图3所示,本公开提供一种定位方法,该方法包括S101至S105。FIG3 is a schematic flow chart of a positioning method according to some embodiments. As shown in FIG3 , the present disclosure provides a positioning method, which includes S101 to S105 .
S101、服务器200确定终端设备的初始位置。S101. The server 200 determines an initial location of a terminal device.
在一些实施例中,服务器200接收到来自终端设备100的第一请求信息。第一请求信息用于请求局部点云地图,第一请求信息包括用于定位终端设备100的初始位置的视频帧图像,服务器200根据接收到的第一请求信息中的视频帧图像,确定终端设备100的初始位置。In some embodiments, the server 200 receives a first request message from the terminal device 100. The first request message is used to request a local point cloud map, and the first request message includes a video frame image used to locate the initial position of the terminal device 100. The server 200 determines the initial position of the terminal device 100 according to the video frame image in the received first request message.
示例性地,用户在需要使用AR导航时,可以通过在终端设备100上安装的网页浏览器内输入链接,以开启终端设备的AR导航功能。终端设备100在接收到用户的输入指令后,响应于该输入指令,发出用于提示用户使用终端设备100扫描用户周围环境的提示信息。Exemplarily, when the user needs to use AR navigation, he can open the AR navigation function of the terminal device by inputting a link in a web browser installed on the terminal device 100. After receiving the user's input instruction, the terminal device 100 responds to the input instruction and issues a prompt message for prompting the user to use the terminal device 100 to scan the user's surrounding environment.
终端设备100在接收到用户的扫描操作后,响应于该扫描操作,拍摄用户周围环境,得到用户周围环境的视频帧图像。并且,终端设备100在得到用户周围环境 的视频帧图像之后,向服务器200发送第一请求信息。服务器200接收到来自于终端设备100的第一请求信息,并根据接收到的第一请求信息中的视频帧图像,确定终端设备100的初始位置。After receiving the scanning operation of the user, the terminal device 100 responds to the scanning operation and captures the user's surrounding environment to obtain a video frame image of the user's surrounding environment. After receiving the video frame image, the server 200 sends the first request information to the server 200. The server 200 receives the first request information from the terminal device 100, and determines the initial position of the terminal device 100 according to the video frame image in the received first request information.
S102、服务器200根据终端设备100的初始位置,从多个局部点云地图中确定与终端设备100的初始位置距离最近的目标局部点云地图。S102 , the server 200 determines, based on the initial position of the terminal device 100 , from a plurality of local point cloud maps, a target local point cloud map that is closest to the initial position of the terminal device 100 .
在一些实施例中,服务器200的存储器中预先存储有多个局部点云地图,每一个局部点云地图对应不同的覆盖范围。服务器200在确定终端设备100的初始位置之后,可以将终端设备100的初始位置与各个局部点云地图的中心点之间的距离进行对比,从多个局部点云地图中确定中心点与终端设备100的初始位置最接近的目标局部点云地图。In some embodiments, a plurality of local point cloud maps are pre-stored in the memory of the server 200, and each local point cloud map corresponds to a different coverage range. After determining the initial position of the terminal device 100, the server 200 can compare the distance between the initial position of the terminal device 100 and the center point of each local point cloud map, and determine the target local point cloud map whose center point is closest to the initial position of the terminal device 100 from the multiple local point cloud maps.
在一些实施例中,多个局部点云地图是由多个初始局部点云地图经过优化处理后得到的,多个初始局部点云地图是由点云地图经过分割后得到的。In some embodiments, the multiple local point cloud maps are obtained by optimizing the multiple initial local point cloud maps, and the multiple initial local point cloud maps are obtained by segmenting the point cloud maps.
点云地图以及多个局部点云地图的构建过程如S1至S4所示。The construction process of the point cloud map and multiple local point cloud maps is shown in S1 to S4.
S1、图像采集。S1. Image acquisition.
图像采集人员可以通过摄像装置采集需要搭建AR导航的多帧环境图像,并在采集需要搭建AR导航的多帧环境图像后,将需要搭建AR导航的多帧环境图像上传至服务器200。The image acquisition personnel can use a camera device to collect the multi-frame environment images that need to build AR navigation, and after collecting the multi-frame environment images that need to build AR navigation, upload the multi-frame environment images that need to build AR navigation to the server 200.
在一些实施例中,摄像装置可以是终端设备100,终端设备100安装(或携带、配置)全景摄像头,即,终端设备100和全景摄像头为一体设备。终端设备100可以利用全景摄像头采集AR导航所经过的环境图像。图像采集人员可以手持终端设备100对AR导航所经过的环境进行图像采集,并通过终端设备100实时向服务器200传输采集到的多帧环境图像。In some embodiments, the camera device may be a terminal device 100, and the terminal device 100 is equipped with (or carries, is equipped with) a panoramic camera, that is, the terminal device 100 and the panoramic camera are an integrated device. The terminal device 100 can use the panoramic camera to collect images of the environment through which the AR navigation passes. The image collector can hold the terminal device 100 to collect images of the environment through which the AR navigation passes, and transmit the collected multi-frame environmental images to the server 200 in real time through the terminal device 100.
在另一些实施例中,摄像装置可以是全景相机,即,终端设备100和全景相机是两个独立的物理设备。终端设备100和全景相机通过有线连接或无线连接的方式连接。终端设备100通过有线连接或无线连接的方式获取全景相机拍摄的多帧环境图像,进而通过有线连接或无线连接的方式向服务器200传输获取到的多帧环境图像。无线连接可以为蓝牙连接或无线网络连接,本公开对于终端设备100与全景相机的连接方式不作限定。In other embodiments, the camera device may be a panoramic camera, that is, the terminal device 100 and the panoramic camera are two independent physical devices. The terminal device 100 and the panoramic camera are connected by wired connection or wireless connection. The terminal device 100 obtains multiple frames of environmental images taken by the panoramic camera by wired connection or wireless connection, and then transmits the obtained multiple frames of environmental images to the server 200 by wired connection or wireless connection. The wireless connection may be a Bluetooth connection or a wireless network connection, and the present disclosure does not limit the connection method between the terminal device 100 and the panoramic camera.
S2、点云地图构建。S2. Point cloud map construction.
在一些实施例中,服务器200在获取到需要搭建AR导航的多帧环境图像后,可以对多帧环境图像进行三维重建,得到点云地图。In some embodiments, after acquiring the multiple frames of environment images required for building AR navigation, the server 200 can perform three-dimensional reconstruction on the multiple frames of environment images to obtain a point cloud map.
在一些实施例中,服务器200以基于第一特征提取方式的三维重建方法对多帧环境图像进行处理,确定多个三维点中各个三维点的位姿信息和第一特征描述子。之后,服务器200以第二特征提取方式对多帧环境图像进行处理,确定上述多个三维点中各个三维点的第二特征描述子。进而,服务器200基于多个三维点中各个三维点的位姿信息和第二特征描述子,构建点云地图。其中,各个三维点的第二特征描述子的数据量小于同一三维点的第一特征描述子的数据量。一个三维点的位姿信息包括该三维点在指定坐标系下的位置和姿态。In some embodiments, the server 200 processes multiple frames of environmental images using a three-dimensional reconstruction method based on a first feature extraction method to determine the pose information and first feature descriptor of each of the multiple three-dimensional points. Afterwards, the server 200 processes the multiple frames of environmental images using a second feature extraction method to determine the second feature descriptor of each of the multiple three-dimensional points. Furthermore, the server 200 constructs a point cloud map based on the pose information and second feature descriptor of each of the multiple three-dimensional points. Among them, the data volume of the second feature descriptor of each three-dimensional point is less than the data volume of the first feature descriptor of the same three-dimensional point. The pose information of a three-dimensional point includes the position and posture of the three-dimensional point in a specified coordinate system.
需要说明的是,特征描述子是对一张图片或者一个图片块的一种表示,通过提取有用信息并扔掉多余的信息来简化图像。通常,特征描述子将一张大小为 width×height×3(通道数)的图片化成一个长度为n的特征向量/数组。It should be noted that the feature descriptor is a representation of an image or an image block, which simplifies the image by extracting useful information and discarding redundant information. Usually, the feature descriptor converts an image of size The image of width×height×3 (number of channels) is converted into a feature vector/array of length n.
可以理解的是,由于各个三维点的第二特征描述子的数据量小于同一三维点的第一特征描述子的数据量,服务器200以多个三维点中各个三维点的位姿信息和第二特征描述子来构建点云地图,可以快速地对各个三维点的特征描述子进行特征提取与特征匹配,进而提升了点云地图的构建速度,以使得能够满足基于终端设备100的AR导航对于服务实时性的要求。It can be understood that since the data amount of the second feature descriptor of each three-dimensional point is smaller than the data amount of the first feature descriptor of the same three-dimensional point, the server 200 constructs a point cloud map with the posture information and the second feature descriptor of each three-dimensional point among multiple three-dimensional points, and can quickly extract and match features of the feature descriptors of each three-dimensional point, thereby improving the construction speed of the point cloud map, so as to meet the real-time service requirements of AR navigation based on the terminal device 100.
在一些实施例中,服务器200构建点云地图后,将点云地图存储为三维点加描述符格式。In some embodiments, after constructing the point cloud map, the server 200 stores the point cloud map in a three-dimensional point plus descriptor format.
在一些实施例中,第一特征提取方式为SIFT特征提取方式,第二特征提取方式为(oriented fast and rotated brief,ORB)特征提取方式。In some embodiments, the first feature extraction method is the SIFT feature extraction method, and the second feature extraction method is the oriented fast and rotated brief (ORB) feature extraction method.
在一些实施例中,第一特征描述子可以是SIFT特征描述子,也可以是AKAZE特征描述子,第二特征描述子可以是ORB特征描述子,也可以是SUPERPOINT特征描述子。In some embodiments, the first feature descriptor may be a SIFT feature descriptor or an AKAZE feature descriptor, and the second feature descriptor may be an ORB feature descriptor or a SUPERPOINT feature descriptor.
示例性地,以第一特征描述子为SIFT特征描述子,第二特征描述子为ORB特征描述子为例,服务器200基于多帧环境图像的SIFT特征进行第一次三维重建,得到多个三维点中各个三维点的位姿信息和SIFT特征描述子。然后,服务器200基于多帧图像的ORB特征进行第二次三维重建,得到多个三维点中各个三维点的ORB特征描述子。进而,服务器200基于多个三维点中各个三维点的位姿信息和ORB特征描述子来构建点云地图。Exemplarily, taking the first feature descriptor as the SIFT feature descriptor and the second feature descriptor as the ORB feature descriptor as an example, the server 200 performs the first 3D reconstruction based on the SIFT features of the multi-frame environment image to obtain the pose information and SIFT feature descriptor of each of the multiple 3D points. Then, the server 200 performs the second 3D reconstruction based on the ORB features of the multi-frame image to obtain the ORB feature descriptor of each of the multiple 3D points. Furthermore, the server 200 constructs a point cloud map based on the pose information and ORB feature descriptor of each of the multiple 3D points.
可以理解的是,虽然多个三维点的SIFT特征描述子的描述能力强、精度高,但服务器200对三维点的SIFT特征描述子进行特征提取与特征匹配时间过长,无法满足基于终端设备200的AR导航对于服务实时性的要求,而一个三维点的ORB特征描述子的数据量小于同一三维点的SIFT特征描述子的数据量,因此,服务器200基于多个三维点中各个三维点的位姿信息和ORB特征描述子来构建点云地图,提高了三维点的特征描述子的特征提取速度和特征匹配速度,提升了点云地图的构建速度,使得能够满足基于终端设备100的AR导航对于服务实时性的要求。It is understandable that although the SIFT feature descriptors of multiple three-dimensional points have strong descriptive ability and high accuracy, the server 200 takes too long to perform feature extraction and feature matching on the SIFT feature descriptors of three-dimensional points, and cannot meet the real-time service requirements of the AR navigation based on the terminal device 200. The data volume of the ORB feature descriptor of a three-dimensional point is smaller than the data volume of the SIFT feature descriptor of the same three-dimensional point. Therefore, the server 200 constructs a point cloud map based on the pose information of each three-dimensional point in the multiple three-dimensional points and the ORB feature descriptor, which improves the feature extraction speed and feature matching speed of the feature descriptors of the three-dimensional points, and improves the construction speed of the point cloud map, so that the real-time service requirements of the AR navigation based on the terminal device 100 can be met.
上述实施例是以服务器200基于SFM方式进行三维重建来得到点云地图,示例性地,服务器200还可以基于SLAM技术进行三维重建来得到点云地图。例如,服务器200可以基于ORB-SLAM技术进行三维重建来得到点云地图。In the above embodiment, the server 200 performs three-dimensional reconstruction based on the SFM method to obtain a point cloud map. For example, the server 200 can also perform three-dimensional reconstruction based on the SLAM technology to obtain a point cloud map. For example, the server 200 can perform three-dimensional reconstruction based on the ORB-SLAM technology to obtain a point cloud map.
S3、对点云地图进行分割,得到多个初始局部点云地图。S3. Segment the point cloud map to obtain multiple initial local point cloud maps.
在一些实施例中,在点云地图构建完成后,服务器200可以对点云地图进行分割,得到多个初始局部点云地图。In some embodiments, after the point cloud map is constructed, the server 200 can segment the point cloud map to obtain multiple initial local point cloud maps.
在一些实施例中,服务器200对点云地图进行分割,得到多个初始局部点云地图,可以实现为:服务器200在生成点云地图后,将点云地图显示在显示器上,实现点云地图的可视化,以便于用户根据服务器200的显示器上显示的点云地图为点云地图设定地图兴趣点。服务器200在接收到用户对于点云地图的设定地图兴趣点的操作后,响应于该操作,并基于每个地图兴趣点,构建一定大小的地理围栏,将点云地图进行分割,得到多个初始局部点云地图。其中,地图兴趣点也即AR导航过程中的关键点。In some embodiments, the server 200 segments the point cloud map to obtain multiple initial local point cloud maps, which can be implemented as follows: after generating the point cloud map, the server 200 displays the point cloud map on a display to realize visualization of the point cloud map, so that the user can set map points of interest for the point cloud map according to the point cloud map displayed on the display of the server 200. After receiving the user's operation of setting map points of interest for the point cloud map, the server 200 responds to the operation and constructs a geographic fence of a certain size based on each map point of interest, and segments the point cloud map to obtain multiple initial local point cloud maps. Among them, the map points of interest are also the key points in the AR navigation process.
S4、对多个初始局部点云地图进行优化处理,得到多个局部点云地图。 S4. Optimize the multiple initial local point cloud maps to obtain multiple local point cloud maps.
优化处理包括以下至少一项:去除初始局部点云地图中离群的三维点;或者,将初始局部点云地图中三维点与特征描述子之间一对多个的关系转换为一对一的关系。The optimization process includes at least one of the following: removing outlier three-dimensional points in the initial local point cloud map; or converting a one-to-many relationship between three-dimensional points and feature descriptors in the initial local point cloud map into a one-to-one relationship.
在一些实施例中,服务器200可以采用统计滤波方法来去除初始局部点云地图中离群的三维点。In some embodiments, the server 200 may use a statistical filtering method to remove outlier three-dimensional points in the initial local point cloud map.
可以理解的是,初始局部点云地图中存在离群的三维点的情况会对定位的精度造成影响。为了避免初始局部点云地图中存在离群的三维点而造成错误定位的情况发生,服务器200可以去除掉初始局部点云地图中的离群的三维点。It is understandable that the presence of outlier 3D points in the initial local point cloud map will affect the accuracy of positioning. In order to avoid the occurrence of erroneous positioning caused by the presence of outlier 3D points in the initial local point cloud map, the server 200 can remove the outlier 3D points in the initial local point cloud map.
初始局部点云地图是对多帧环境图像进行处理得到的,不同帧的环境图像是对同一环境在不同条件下进行拍摄得到的,不同条件包括不同角度、不同时间和不同位置等。因此,初始局部点云地图中的一个三维点可以与多个特征描述子之间存在对应关系。然而,初始局部点云地图中的一个三维点与多个特征描述子之间存在对应关系的情况,会导致特征匹配时间过长,无法满足基于终端设备的AR导航对于服务实时性的要求。The initial local point cloud map is obtained by processing multiple frames of environmental images. Different frames of environmental images are obtained by shooting the same environment under different conditions. The different conditions include different angles, different times, and different locations. Therefore, a 3D point in the initial local point cloud map can have a corresponding relationship with multiple feature descriptors. However, if a 3D point in the initial local point cloud map has a corresponding relationship with multiple feature descriptors, the feature matching time will be too long, which cannot meet the real-time service requirements of AR navigation based on terminal devices.
基于此,服务器200可以将初始局部点云地图中三维点与特征描述子之间一对多个的关系转换为一对一的关系,以提升特征匹配速度,使得能够满足基于Web的AR导航对于服务实时性的要求。Based on this, the server 200 can convert the one-to-many relationship between the three-dimensional points and the feature descriptors in the initial local point cloud map into a one-to-one relationship to improve the feature matching speed, so as to meet the real-time service requirements of Web-based AR navigation.
需要说明的是,将初始局部点云地图中三维点与特征描述子之间一对多个的关系转换为一对一的关系,可以理解为对初始局部点云地图的特征描述子进行描述子去重。It should be noted that converting the one-to-many relationship between the three-dimensional points and the feature descriptors in the initial local point cloud map into a one-to-one relationship can be understood as deduplication of the feature descriptors of the initial local point cloud map.
S103、服务器200向终端设备100发送目标局部点云地图。S103. The server 200 sends the target local point cloud map to the terminal device 100.
在一些实施例中,服务器200根据终端设备100的初始位置确定与终端设备100的初始位置距离最近的目标局部点云地图后,向终端设备100发送该目标局部点云地图,以便于终端设备100根据该目标局部点云地图实现AR导航功能。In some embodiments, after the server 200 determines the target local point cloud map that is closest to the initial position of the terminal device 100 based on the initial position of the terminal device 100, it sends the target local point cloud map to the terminal device 100 so that the terminal device 100 can implement the AR navigation function based on the target local point cloud map.
示例性地,服务器200通过字节流向终端设备100发送二进制文件BIN格式的目标局部点云地图。这样,一方面,能够在保证数据通信的同时,加快数据的传输速度,有助于满足基于Web的AR导航对于实时性的要求;另一方面,BIN格式的目标局部点云地图占用终端设备100的存储空间较少,可以节省终端设备100的存储空间,能够使终端设备100存储更多的局部点云地图,以便于终端设备100在关闭移动网络的情况下,可以基于存储的局部点云地作为离线地图使用,保证了用户的使用体验。Exemplarily, the server 200 sends the target local point cloud map in the binary file BIN format to the terminal device 100 through the byte stream. In this way, on the one hand, it can speed up the data transmission while ensuring data communication, which helps to meet the real-time requirements of Web-based AR navigation; on the other hand, the target local point cloud map in BIN format occupies less storage space of the terminal device 100, which can save the storage space of the terminal device 100 and enable the terminal device 100 to store more local point cloud maps, so that the terminal device 100 can use the stored local point cloud as an offline map when the mobile network is turned off, ensuring the user experience.
在一些实施例中,服务器200在得到多个局部点云地图后,可以将每个局部点云地图的格式转换为BIN格式进行存储,以便于后续服务器200接收到来自于终端设备100发送的用于请求局部点云地图的第一请求信息时,服务器200无需实时将局部点云地图的格式转化为BIN格式,可以即时通过字节流向终端设备100发送BIN格式的局部点云地图,提升了服务器200对于终端设备100的请求信息的响应速度,有助于满足基于Web的AR导航对于实时性的要求。In some embodiments, after obtaining multiple local point cloud maps, the server 200 can convert the format of each local point cloud map into BIN format for storage, so that when the server 200 subsequently receives the first request information sent from the terminal device 100 for requesting the local point cloud map, the server 200 does not need to convert the format of the local point cloud map into BIN format in real time, and can instantly send the local point cloud map in BIN format to the terminal device 100 through a byte stream, thereby improving the response speed of the server 200 to the request information of the terminal device 100, and helping to meet the real-time requirements of Web-based AR navigation.
S104、终端设备100从服务器200获取目标局部点云地图。S104. The terminal device 100 obtains the target local point cloud map from the server 200.
目标局部点云地图为多个局部点云地图中与终端设备100的初始位置距离最近的局部点云地图。 The target local point cloud map is the local point cloud map that is closest to the initial position of the terminal device 100 among the multiple local point cloud maps.
在一些实施例中,终端设备100在接收到用户用于开启AR导航功能的指令后,响应于该指令,终端设备100向服务器200发送第一请求信息。第一请求信息用于请求目标局部点云地图,第一请求信息包括用于定位初始位置的视频帧图像。进而,终端设备100接收到来自于服务器200的第一响应信息,第一响应信息包括目标局部点云地图。In some embodiments, after receiving the user's instruction to turn on the AR navigation function, the terminal device 100 sends a first request message to the server 200 in response to the instruction. The first request message is used to request a target local point cloud map, and the first request message includes a video frame image for locating the initial position. Then, the terminal device 100 receives a first response message from the server 200, and the first response message includes a target local point cloud map.
在一些实施例中,在终端设备100接收到来自于服务器200的BIN格式的目标局部点云地图后,终端设备100读取BIN格式的目标局部点云地图,确定目标局部点云地图的地理围栏信息,根据目标局部点云地图的地理围栏信息设定中心点,并将中心点加载至Web浏览器客户端,存入Web浏览器客户端JavaScript数据结构中。In some embodiments, after the terminal device 100 receives the target local point cloud map in BIN format from the server 200, the terminal device 100 reads the target local point cloud map in BIN format, determines the geo-fence information of the target local point cloud map, sets the center point according to the geo-fence information of the target local point cloud map, and loads the center point to the Web browser client and stores it in the JavaScript data structure of the Web browser client.
S105、在移动入目标局部点云地图的覆盖范围之后,基于当前视频帧图像和目标局部点云地图,确定当前视频帧图像对应的位姿信息。S105. After moving into the coverage of the target local point cloud map, determine the position and posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map.
在一些实施例中,终端设备100在获取到目标局部点云地图后,终端设备100可以根据实时拍摄的周围环境的视频流实时进行定位,并结合目标局部点云地图的覆盖范围,确定是否移动入目标局部点云地图的覆盖范围。In some embodiments, after the terminal device 100 obtains the target local point cloud map, the terminal device 100 can perform real-time positioning based on the video stream of the surrounding environment captured in real time, and determine whether to move into the coverage of the target local point cloud map in combination with the coverage of the target local point cloud map.
确定是否移动入目标局部点云地图的覆盖范围可以包括以下几种情形:Determining whether to move into the coverage of the target local point cloud map may include the following situations:
情形1、由服务器200确定是否移动入目标局部点云地图的覆盖范围。Scenario 1: The server 200 determines whether the object has moved into the coverage of the target local point cloud map.
在一些实施例,终端设备100可以基于WebRTC技术实时截取终端设备100拍摄的用户周围环境的视频流,得到实时的用户周围环境的视频帧图像,进而将实时的用户周围环境的视频帧图像发送至服务器200,由服务器200对用户周围环境的视频帧图像进行定位,得到终端设备100的当前位置。In some embodiments, the terminal device 100 can intercept in real time based on WebRTC technology the video stream of the user's surrounding environment shot by the terminal device 100 to obtain a real-time video frame image of the user's surrounding environment, and then send the real-time video frame image of the user's surrounding environment to the server 200, which locates the video frame image of the user's surrounding environment to obtain the current position of the terminal device 100.
然后,服务,200实时计算终端设备100的当前位置与目标局部点云地图的中心点之间的距离。若检测到终端设备100的当前位置与目标局部点云地图的中心点之间的距离小于或等于局部点云地图的内圈半径范围,则确定终端设备100移动入局部点云地图的覆盖范围。在确定终端设备100移动入局部点云地图的覆盖范围后,终端设备100接收到来自于服务器200的第二响应信息,第二响应信息用于指示终端设备100移动入目标局部点云地图的覆盖范围。Then, the server 200 calculates the distance between the current position of the terminal device 100 and the center point of the target local point cloud map in real time. If it is detected that the distance between the current position of the terminal device 100 and the center point of the target local point cloud map is less than or equal to the inner radius of the local point cloud map, it is determined that the terminal device 100 moves into the coverage range of the local point cloud map. After determining that the terminal device 100 moves into the coverage range of the local point cloud map, the terminal device 100 receives a second response message from the server 200, and the second response message is used to indicate that the terminal device 100 moves into the coverage range of the target local point cloud map.
情形2、由终端设备100确定是否移动入目标局部点云地图的覆盖范围。Scenario 2: The terminal device 100 determines whether it has moved into the coverage of the target local point cloud map.
在一些实施例中,终端设备100可以基于WebRTC技术实时截取终端设备100拍摄的用户周围环境的视频流,得到实时的用户周围环境的视频帧图像,进而对实时的用户周围环境的视频帧图像进行定位,得到终端设备100的当前位置,然后实时计算终端设备100的当前位置与目标局部点云地图的中心点之间的距离。若检测到终端设备100的当前位置与目标局部点云地图的中心点之间的距离小于或等于局部点云地图的内圈半径范围,则确定终端设备100移动入局部点云地图的覆盖范围。In some embodiments, the terminal device 100 can intercept the video stream of the user's surrounding environment shot by the terminal device 100 in real time based on the WebRTC technology, obtain the real-time video frame image of the user's surrounding environment, and then locate the real-time video frame image of the user's surrounding environment to obtain the current position of the terminal device 100, and then calculate the distance between the current position of the terminal device 100 and the center point of the target local point cloud map in real time. If it is detected that the distance between the current position of the terminal device 100 and the center point of the target local point cloud map is less than or equal to the inner circle radius range of the local point cloud map, it is determined that the terminal device 100 moves into the coverage range of the local point cloud map.
在确定移动入目标局部点地图的覆盖范围之后,终端设备100可以基于当前视频帧图像和目标局部点云地图,确定当前视频帧图像对应的位姿信息。After determining the coverage range of the target local point map, the terminal device 100 can determine the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map.
在一些实施例中,终端设备100基于当前视频帧图像和目标局部点云地图,确定当前视频帧图像对应的位姿信息,可以实现为X1至X3。In some embodiments, the terminal device 100 determines the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map, which can be implemented as X1 to X3.
X1、对当前视频帧图像进行特征提取,得到当前视频帧图像的特征描述子。X1. Perform feature extraction on the current video frame image to obtain a feature descriptor of the current video frame image.
示例性地,终端设备100可以对当前视频帧图像进行ORB特征提取,以得到当前视频帧图像的ORB特征描述子。 Exemplarily, the terminal device 100 may perform ORB feature extraction on the current video frame image to obtain an ORB feature descriptor of the current video frame image.
在一些实施例中,为了提升对于当前视频帧图像进行特征提取的速率,以满足导航对于实时性的要求,终端设备100在对当前视频帧图像进行ORB特征提取时,可以采用JsFeat技术进行加速。In some embodiments, in order to increase the rate of feature extraction for the current video frame image to meet the real-time requirements of navigation, the terminal device 100 may use JsFeat technology to accelerate when performing ORB feature extraction on the current video frame image.
X2、对当前视频帧图像的特征描述子和目标局部点云地图中三维点的特征描述子进行特征匹配,确定匹配出来的三维点。X2. Perform feature matching on the feature descriptor of the current video frame image and the feature descriptor of the three-dimensional point in the target local point cloud map to determine the matched three-dimensional point.
在一些实施例中,为了提升匹配出来的三维点的质量,终端设备100可以基于WebAssembly组件对当前视频帧图像的特征描述子和局部点云地图中的三维点的特征描述子进行特征匹配,确定匹配出来的三维点。In some embodiments, in order to improve the quality of the matched three-dimensional points, the terminal device 100 can perform feature matching on the feature descriptors of the current video frame image and the feature descriptors of the three-dimensional points in the local point cloud map based on the WebAssembly component to determine the matched three-dimensional points.
X3、基于匹配出来的三维点的位姿信息,得到当前视频帧图像对应的位姿信息。X3. Based on the posture information of the matched three-dimensional points, obtain the posture information corresponding to the current video frame image.
在一些实施例中,终端设备100确定匹配出来的三维点之后,可以对匹配出来的三维点的位姿信息进行位姿求解,得到当前视频帧图像对应的位姿信息。其中,当前视频帧图像对应的位姿信息也就是拍摄当前视频帧图像时终端设备100相对于真实物体(世界坐标系)的位姿,位姿信息包括终端设备100在世界坐标系下的位置以及姿态。In some embodiments, after the terminal device 100 determines the matched three-dimensional point, it can solve the posture information of the matched three-dimensional point to obtain the posture information corresponding to the current video frame image. The posture information corresponding to the current video frame image is the posture of the terminal device 100 relative to the real object (world coordinate system) when the current video frame image is shot, and the posture information includes the position and posture of the terminal device 100 in the world coordinate system.
在一些实施例中,为了提升得到的当前视频帧图像对应的位姿信息的精准度,终端设备100还可以基于WebAssembly组件对匹配出来的三维点的位姿信息进行位姿求解,得到当前视频帧图像对应的位姿信息。In some embodiments, in order to improve the accuracy of the posture information corresponding to the current video frame image, the terminal device 100 can also perform posture solution on the posture information of the matched three-dimensional points based on the WebAssembly component to obtain the posture information corresponding to the current video frame image.
本公开实施例中,终端设备100根据从服务器200获取的目标局部点云地图和当前视频帧图像,完成对于终端设备100的定位。可以理解的是,点云地图相对于局部点云地图较大,对于算力要求较高,不适用于算力有限的终端设备100。而局部点云地图相对于点云地图较小,对于算力要求较低,终端设备100可以根据目标局部点云地图和当前视频帧图像来完成对于终端设备100的定位,实现了以终端设备100有限的算力对终端设备100进行较高精度的定位。且终端设备100是以局部点云地图和当前视频帧图像来对终端设备100进行定位,受室内建筑物的影响较小,使得基于终端设备100的AR导航可以适用于室内AR导航中,提升了基于终端设备100的AR导航的适用范围。In the disclosed embodiment, the terminal device 100 completes the positioning of the terminal device 100 according to the target local point cloud map and the current video frame image obtained from the server 200. It can be understood that the point cloud map is larger than the local point cloud map, and has higher requirements for computing power, and is not suitable for the terminal device 100 with limited computing power. The local point cloud map is smaller than the point cloud map, and has lower requirements for computing power. The terminal device 100 can complete the positioning of the terminal device 100 according to the target local point cloud map and the current video frame image, and realizes the positioning of the terminal device 100 with higher precision using the limited computing power of the terminal device 100. Moreover, the terminal device 100 uses the local point cloud map and the current video frame image to locate the terminal device 100, and is less affected by indoor buildings, so that the AR navigation based on the terminal device 100 can be applied to indoor AR navigation, which improves the scope of application of the AR navigation based on the terminal device 100.
在一些实施例中,本公开实施例提供的一种定位方法,在确定当前视频帧图像对应的位姿信息后,还可以根据多个时刻的视频帧图像的位姿信息生成导航轨迹,示例性地,导航轨迹的生成过程可以包括A1至A3。In some embodiments, a positioning method provided by an embodiment of the present disclosure can, after determining the posture information corresponding to the current video frame image, generate a navigation trajectory based on the posture information of the video frame images at multiple moments. Exemplarily, the navigation trajectory generation process can include A1 to A3.
A1、终端设备100获取位姿序列。A1. The terminal device 100 obtains a posture sequence.
位姿序列包括多个时刻的视频帧图像对应的位姿信息。The pose sequence includes pose information corresponding to video frame images at multiple moments.
在一些实施例中,在终端设备100开启AR导航功能过程中,终端设备100可以基于WebRTC技术截取一段时间内终端设备100拍摄的用户周围环境的视频流,得到用户周围环境的多个时刻的视频帧图像,进而对多个时刻的视频帧图像进行S105的处理,得到多个时刻的视频帧图像的位姿信息,也即得到位姿序列。In some embodiments, when the terminal device 100 turns on the AR navigation function, the terminal device 100 can intercept the video stream of the user's surrounding environment shot by the terminal device 100 over a period of time based on the WebRTC technology, and obtain video frame images of the user's surrounding environment at multiple moments, and then perform S105 processing on the video frame images at multiple moments to obtain the posture information of the video frame images at multiple moments, that is, to obtain a posture sequence.
A2、终端设备100对位姿序列进行插值处理和滤波处理,得到经过插值处理和滤波处理后的位姿序列。A2. The terminal device 100 performs interpolation processing and filtering processing on the posture sequence to obtain a posture sequence after interpolation processing and filtering processing.
可以理解的是,在终端设备100拍摄用户周围环境的过程中,也即在图像采集过程中,可能存在漏采集或误采集,导致获取到的位姿序列不够平滑。为了降低图像采集过程中漏采集或误采集对于位姿序列平滑程度的影响,可以对位姿序列进行插值处理和滤波处理,以使得经过插值处理和滤波处理后的位姿序列具有良好的平滑程度。 It is understandable that in the process of the terminal device 100 photographing the user's surroundings, that is, in the image acquisition process, there may be missed acquisition or erroneous acquisition, resulting in the acquired pose sequence not being smooth enough. In order to reduce the impact of missed acquisition or erroneous acquisition on the smoothness of the pose sequence during the image acquisition process, the pose sequence can be interpolated and filtered so that the pose sequence after interpolation and filtering has a good degree of smoothness.
A3、终端设备100基于经过插值处理和滤波处理后的位姿序列,显示导航轨迹。A3. The terminal device 100 displays the navigation trajectory based on the posture sequence after interpolation and filtering.
在一些实施例中,在终端设备100得到经过插值处理和滤波处理后的位姿序列后,终端设备100基于经过插值处理和滤波处理后的位姿序列和目标局部点云地图,生成导航轨迹,并显示导航轨迹。In some embodiments, after the terminal device 100 obtains the pose sequence after interpolation and filtering, the terminal device 100 generates a navigation trajectory based on the pose sequence after interpolation and filtering and the target local point cloud map, and displays the navigation trajectory.
在一些实施例中,终端设备100基于经过插值处理和滤波处理后的位姿序列,显示导航轨迹,可以实现为:终端设备100基于经过插值处理和滤波处理后的位姿序列,通过Web浏览器客户端显示导航轨迹。In some embodiments, the terminal device 100 displays the navigation trajectory based on the posture sequence after interpolation and filtering, which can be implemented as follows: the terminal device 100 displays the navigation trajectory through a web browser client based on the posture sequence after interpolation and filtering.
在一些实施例中,在终端设备100显示导航轨迹过程中,终端设备100根据截取的终端设备100拍摄的用户周围环境的视频流中的当前视频帧图像,实时更新终端设备100的当前位置,并结合局部点云地图和当前视频帧图像之前的上一帧视频帧图像,实时更新导航轨迹,实现跟踪效果。In some embodiments, while the terminal device 100 is displaying a navigation trajectory, the terminal device 100 updates the current position of the terminal device 100 in real time based on the current video frame image in the video stream of the user's surrounding environment captured by the terminal device 100, and updates the navigation trajectory in real time in combination with the local point cloud map and the previous video frame image before the current video frame image to achieve a tracking effect.
本公开实施例中,终端设备100在得到位姿序列后,通过对位姿序列进行插值处理和滤波处理,使得基于经过插值处理和滤波处理后的位姿序列而得到的导航轨迹更加平滑,且在终端设备100显示导航轨迹的过程中,终端设备100根据上一帧视频帧图像和局部点云地图实时更新导航轨迹,实现对用户的跟踪效果,有助于提升用户的使用体验。In the disclosed embodiment, after obtaining the posture sequence, the terminal device 100 performs interpolation and filtering on the posture sequence, so that the navigation trajectory obtained based on the posture sequence after interpolation and filtering is smoother. In the process of the terminal device 100 displaying the navigation trajectory, the terminal device 100 updates the navigation trajectory in real time according to the previous video frame image and the local point cloud map, thereby achieving a tracking effect for the user, which helps to improve the user experience.
上述主要从方法的角度对本公开实施例提供的方案进行了介绍。为了实现上述功能,定位装置包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本公开能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。The above mainly introduces the solution provided by the embodiment of the present disclosure from the perspective of the method. In order to achieve the above functions, the positioning device includes a hardware structure and/or software module corresponding to the execution of each function. It should be easily appreciated by those skilled in the art that, in combination with the units and algorithm steps of each example described in the embodiment disclosed herein, the present disclosure can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to exceed the scope of the present disclosure.
图4为根据一些实施例的一种定位装置的组成示意图,如图4所示本公开提供一种定位装置2000,定位装置2000可以是上述终端设备100,或者是上述终端设备100中的一个功能模块,再或者是与上述终端设备100连接的任一电子设备等,本公开对比不作限定。定位装置2000包括通信单元2001和处理单元2002。在一些实施例中,定位装置2000还可以包括存储单元2003。FIG4 is a schematic diagram of the composition of a positioning device according to some embodiments. As shown in FIG4, the present disclosure provides a positioning device 2000. The positioning device 2000 may be the terminal device 100, or a functional module in the terminal device 100, or any electronic device connected to the terminal device 100, etc., which is not limited by the present disclosure. The positioning device 2000 includes a communication unit 2001 and a processing unit 2002. In some embodiments, the positioning device 2000 may also include a storage unit 2003.
在一些实施例中,通信单元2001,用于从服务器200处获取目标局部点云地图,目标局部点云地图为多个局部点云地图中与终端设备100的初始位置距离最近的局部点云地图。In some embodiments, the communication unit 2001 is used to obtain a target local point cloud map from the server 200 , where the target local point cloud map is a local point cloud map that is closest to an initial position of the terminal device 100 among multiple local point cloud maps.
处理单元2002,用于在移动入目标局部点云地图的覆盖范围之后,基于当前视频帧图像和目标局部点云地图,确定当前视频帧图像对应的位姿信息。The processing unit 2002 is used to determine the posture information corresponding to the current video frame image based on the current video frame image and the target local point cloud map after moving into the coverage area of the target local point cloud map.
在一些实施例中,每个局部点云地图由一个初始局部点云地图经过优化处理后得到;优化处理包括以下至少一项:去除初始局部点云地图中离群的三维点;或者,将初始局部点云地图中三维点与特征描述子之间一对多个的关系转换为一对一的关系。In some embodiments, each local point cloud map is obtained by optimizing an initial local point cloud map; the optimization process includes at least one of the following: removing outlier three-dimensional points in the initial local point cloud map; or, converting a one-to-many relationship between three-dimensional points and feature descriptors in the initial local point cloud map into a one-to-one relationship.
在一些实施例中,初始局部点云地图从点云地图中分割得到,点云地图通过以下方式构建:以基于第一特征提取方式的三维重建方法对多帧环境图像进行处理,确定多个三维点中各个三维点的位姿信息和第一特征描述子;以第二特征提取方式 对多帧环境图像进行处理,确定多个三维点中各个三维点的第二特征描述子;其中,三维点的第二特征描述子的数据量小于同一三维点的第一特征描述子的数据量;基于多个三维点中各个三维点的位姿信息和第二特征描述子,构建点云地图。In some embodiments, the initial local point cloud map is segmented from the point cloud map, and the point cloud map is constructed by: processing multiple frames of environmental images with a three-dimensional reconstruction method based on a first feature extraction method to determine the position information and the first feature descriptor of each of the multiple three-dimensional points; and using a second feature extraction method to extract the position information and the first feature descriptor of each of the three-dimensional points. Process multiple frames of environmental images to determine a second feature descriptor for each of the multiple three-dimensional points; wherein the data volume of the second feature descriptor for the three-dimensional point is less than the data volume of the first feature descriptor for the same three-dimensional point; and construct a point cloud map based on the pose information and the second feature descriptor for each of the multiple three-dimensional points.
在一些实施例中,通信单元,还用于:向服务器200发送第一请求信息,第一请求信息用于请求目标局部点云地图,第一请求信息包括用于定位初始位置的视频帧图像。In some embodiments, the communication unit is further used to: send a first request message to the server 200, the first request message is used to request a target local point cloud map, and the first request message includes a video frame image for locating an initial position.
通信单元,用于接收来自于服务器200的第一响应信息,第一响应信息包括目标局部点云地图。The communication unit is used to receive a first response message from the server 200, where the first response message includes a target local point cloud map.
在一些实施例中,存储单元2003,用于存储局部点云地图。In some embodiments, the storage unit 2003 is used to store a local point cloud map.
在一些实施例中,通信单元2001,用于接收服务器200发送的二进制文件格式的目标局部点云地图。In some embodiments, the communication unit 2001 is used to receive the target local point cloud map in a binary file format sent by the server 200.
在一些实施例中,处理单元,用于:对当前视频帧图像进行特征提取,得到当前视频帧图像的特征描述子;对当前视频帧图像的特征描述子和局部点云地图中三维点的特征描述子进行特征匹配,确定匹配出来的三维点;基于匹配出来的三维点的位姿信息,得到当前视频帧图像对应的位姿信息。In some embodiments, the processing unit is used to: extract features from the current video frame image to obtain a feature descriptor of the current video frame image; perform feature matching between the feature descriptor of the current video frame image and the feature descriptor of the three-dimensional points in the local point cloud map to determine the matched three-dimensional points; and obtain the pose information corresponding to the current video frame image based on the pose information of the matched three-dimensional points.
在一些实施例中,通信单元2001,还用于获取位姿序列,位姿序列包括多个时刻的视频帧图像对应的位姿信息。In some embodiments, the communication unit 2001 is further used to obtain a posture sequence, where the posture sequence includes posture information corresponding to video frame images at multiple moments.
处理单元2002,还用于:对位姿序列进行插值处理和滤波处理,得到经过插值处理和滤波处理后的位姿序列;基于经过插值处理和滤波处理后的位姿序列,显示导航轨迹。The processing unit 2002 is further used to: perform interpolation processing and filtering processing on the posture sequence to obtain the posture sequence after interpolation processing and filtering processing; and display the navigation trajectory based on the posture sequence after interpolation processing and filtering processing.
在一些实施例中,处理单元2002,用于基于经过插值处理和滤波处理后的位姿序列,通过万维网web浏览器客户端显示导航轨迹。In some embodiments, the processing unit 2002 is used to display the navigation trajectory through a World Wide Web web browser client based on the posture sequence after interpolation processing and filtering processing.
图5为根据一些实施例的另一种定位装置的组成示意图,如图5所示,本公开提供一种定位装置3000,定位装,3000可以是上述服务器200,或者是上述服务器200中的一个功能模块,再或者是与上述服务器200连接的任一电子设备等,本公开对比不作限定。定位装置3000包括通信单元3001和处理单元3002。在一些实施例中,定位装置3000还可以包括存储单元3003。FIG5 is a schematic diagram of another positioning device according to some embodiments. As shown in FIG5, the present disclosure provides a positioning device 3000. The positioning device 3000 may be the server 200, or a functional module in the server 200, or any electronic device connected to the server 200, etc., which is not limited by the present disclosure. The positioning device 3000 includes a communication unit 3001 and a processing unit 3002. In some embodiments, the positioning device 3000 may also include a storage unit 3003.
在一些实施例中,处理单元3002,用于确定终端设备100的初始位置。In some embodiments, the processing unit 3002 is used to determine an initial position of the terminal device 100 .
处理单元3002,还用于根据终端设备100的初始位置,从多个局部点云地图中确定与终端设备100的初始位置距离最近的目标局部点云地图。The processing unit 3002 is further used to determine, based on the initial position of the terminal device 100 , from a plurality of local point cloud maps, a target local point cloud map that is closest to the initial position of the terminal device 100 .
通信单元3001,用于向终端设备100发送目标局部点云地图。The communication unit 3001 is used to send the target local point cloud map to the terminal device 100.
在一些实施例中,处理单元3002,还用于:对点云地图进行分割,得到多个初始局部点云地图;对多个初始局部点云地图分别进行优化处理,得到多个局部点云地图;其中,优化处理包括以下至少一项:去除初始局部点云地图中离群的三维点;或者,将初始局部点云地图中三维点与特征描述子之间一对多个的关系转换为一对一的关系。In some embodiments, the processing unit 3002 is also used to: segment the point cloud map to obtain multiple initial local point cloud maps; optimize the multiple initial local point cloud maps respectively to obtain multiple local point cloud maps; wherein the optimization processing includes at least one of the following: removing outlier three-dimensional points in the initial local point cloud map; or, converting a one-to-many relationship between the three-dimensional points in the initial local point cloud map and the feature descriptors into a one-to-one relationship.
在一些实施例中,处理单元3002,还用于:以基于第一特征提取方式的三维重建方法对多帧环境图像进行处理,确定多个三维点中各个三维点的位姿信息和第一特征描述子;以第二特征提取方式对多帧环境图像进行处理,确定多个三维点中各个三维点的第二特征描述子;其中,三维点的第二特征描述子的数据量小于同一三 维点的第一特征描述子的数据量;基于多个三维点中各个三维点的位姿信息和第二特征描述子,构建点云地图。In some embodiments, the processing unit 3002 is further used to: process the multiple frames of environmental images using a 3D reconstruction method based on a first feature extraction method to determine the position information and the first feature descriptor of each of the multiple 3D points; process the multiple frames of environmental images using a second feature extraction method to determine the second feature descriptor of each of the multiple 3D points; wherein the amount of data of the second feature descriptor of the 3D point is less than that of the same 3D point. A point cloud map is constructed based on the pose information and the second feature descriptor of each 3D point in the multiple 3D points.
在一些实施例中,第一特征描述子为尺度不变特征变换SIFT特征描述子,第二特征描述子为ORB特征描述子。In some embodiments, the first feature descriptor is a scale-invariant feature transform (SIFT) feature descriptor, and the second feature descriptor is an ORB feature descriptor.
在一些实施例中,通信单元3001,用于接收来自于终端设备100的第一请求信息,第一请求信息用于请求局部点云地图,第一请求信息包括用于定位终端设备100的初始位置的视频帧图像。In some embodiments, the communication unit 3001 is used to receive first request information from the terminal device 100 , where the first request information is used to request a local point cloud map, and the first request information includes a video frame image for locating an initial position of the terminal device 100 .
处理单元3002,用于根据用于定位终端设备100的初始位置的视频帧图像,确定终端设备100的初始位置。The processing unit 3002 is used to determine the initial position of the terminal device 100 according to the video frame image used to locate the initial position of the terminal device 100.
在一些实施例中,通信单元3001,用于向终端设备100发送二进制文件格式的目标局部点云地图。In some embodiments, the communication unit 3001 is used to send a target local point cloud map in a binary file format to the terminal device 100 .
在一些实施例中,上述方法适用于运行Web浏览器服务端的服务器,Web浏览器服务端与终端设备100上运行的Web浏览器客户端之间存在通信连接。In some embodiments, the above method is applicable to a server running a Web browser server, and there is a communication connection between the Web browser server and a Web browser client running on the terminal device 100.
在一些实施例中,存储单元3003用于存储点云地图。In some embodiments, the storage unit 3003 is used to store the point cloud map.
图4和图5中的单元也可以称为模块,例如,处理单元可以称为处理模块。The units in FIG. 4 and FIG. 5 may also be referred to as modules. For example, a processing unit may be referred to as a processing module.
图4和图5中的各个单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本公开各个实施例所述方法的全部或部分步骤。存储计算机软件产品的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the various units in FIG. 4 and FIG. 5 are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present disclosure is essentially or the part that contributes to the relevant technology or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) or a processor (processor) to perform all or part of the steps of the method described in each embodiment of the present disclosure. The storage medium for storing computer software products includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc. Various media that can store program codes.
图6为根据一些实施例的一种终端设备的结构示意图,如图6所示,在采用硬件的形式实现上述集成的模块的功能的情况下,本公开提供一种终端设备,终端设备4000,终端设备4000可以是上述定位装置2000。终端设备4000包括:处理器4002,通信接口4003,总线4004。在一些实施例中,终端设备4000还可以包括存储器4001。FIG6 is a schematic diagram of the structure of a terminal device according to some embodiments. As shown in FIG6, in the case of implementing the functions of the above-mentioned integrated modules in the form of hardware, the present disclosure provides a terminal device, terminal device 4000, which may be the above-mentioned positioning device 2000. The terminal device 4000 includes: a processor 4002, a communication interface 4003, and a bus 4004. In some embodiments, the terminal device 4000 may also include a memory 4001.
图7为根据一些实施例的一种服务器的结构示意图,如图7所示,在采用硬件的形式实现上述集成的模块的功能的情况下,本公开提供一种服务器5000,服务器5000可以是上述定位装置3000。服务器5000包括:处理器5002,通信接口5003,总线5004。在一些实施例中,服务器5000还可以包括存储器5001。FIG7 is a schematic diagram of the structure of a server according to some embodiments. As shown in FIG7 , in the case of implementing the functions of the above-mentioned integrated modules in the form of hardware, the present disclosure provides a server 5000, and the server 5000 may be the above-mentioned positioning device 3000. The server 5000 includes: a processor 5002, a communication interface 5003, and a bus 5004. In some embodiments, the server 5000 may also include a memory 5001.
需要说明的是,处理器4002和处理器5002,均可以是实现或执行结合本公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器4002和处理器5002均可以是中央处理器,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器4002和处理器5002可以实现或执行结合本公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器5002也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。 It should be noted that processor 4002 and processor 5002 can both implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the present disclosure. Processor 4002 and processor 5002 can both be central processing units, general-purpose processors, digital signal processors, application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. Processor 4002 and processor 5002 can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the present disclosure. The processor 5002 can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of DSPs and microprocessors, etc.
通信接口4003和通信接口5003,均用于与其他设备通过通信网络连接。通信网络4003和通信接口5003均可以是以太网,无线接入网,无线局域网(wireless local area networks,WLAN)等。Communication interface 4003 and communication interface 5003 are both used to connect to other devices through a communication network. Communication network 4003 and communication interface 5003 can be Ethernet, wireless access network, wireless local area network (WLAN), etc.
存储器4001和存储器5001,均可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。Memory 4001 and memory 5001 may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or electrically erasable programmable read-only memory (EEPROM), disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but are not limited to these.
在一些实施例中,存储器4001可以独立于处理器4002存在,存储器4001可以通过总线4004与处理器4002相连接,用于存储指令或者程序代码。处理器4002调用并执行存储器4001中存储的指令或程序代码时,能够实现本公开实施例提供的定位方法。存储器5001可以独立于处理器5002存在,存储器5001可以通过总线5004与处理器5002相连接,用于存储指令或者程序代码。处理器5002调用并执行存储器5001中存储的指令或程序代码时,能够实现本公开实施例提供的定位方法。In some embodiments, the memory 4001 may exist independently of the processor 4002, and the memory 4001 may be connected to the processor 4002 via a bus 4004 for storing instructions or program codes. When the processor 4002 calls and executes the instructions or program codes stored in the memory 4001, the positioning method provided in the embodiment of the present disclosure can be implemented. The memory 5001 may exist independently of the processor 5002, and the memory 5001 may be connected to the processor 5002 via a bus 5004 for storing instructions or program codes. When the processor 5002 calls and executes the instructions or program codes stored in the memory 5001, the positioning method provided in the embodiment of the present disclosure can be implemented.
在另一些实施例中,存储器4001也可以和处理器4002集成在一起,存储器5001也可以和处理器5002集成在一起。In other embodiments, the memory 4001 may be integrated with the processor 4002 , and the memory 5001 may be integrated with the processor 5002 .
总线4004和总线5004,可以是扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线4004和总线5004可以分为地址总线、数据总线、控制总线等。为便于表示,图6和图7中均仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Bus 4004 and bus 5004 may be extended industry standard architecture (EISA) buses, etc. Bus 4004 and bus 5004 may be divided into address buses, data buses, control buses, etc. For ease of representation, only one thick line is used in both FIG6 and FIG7 , but this does not mean that there is only one bus or one type of bus.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将定位装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above implementation methods, technical personnel in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional modules is used as an example. In actual applications, the above-mentioned functions can be assigned to different functional modules as needed, that is, the internal structure of the positioning device can be divided into different functional modules to complete all or part of the functions described above.
本公开实施例还提供了一种计算机可读存储介质。上述方法实施例中的全部或者部分流程可以由计算机指令来指示相关的硬件完成,该程序可存储于上述计算机可读存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。计算机可读存储介质可以是前述任一实施例的或内存。上述计算机可读存储介质也可以是上述定位装置的外部存储设备,例如上述定位装置上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,上述计算机可读存储介质还可以既包括上述定位装置的内部存储单元也包括外部存储设备。上述计算机可读存储介质用于存储上述计算机程序以及上述定位装置所需的其他程序和数据。上述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The disclosed embodiment also provides a computer-readable storage medium. All or part of the processes in the above method embodiments can be completed by computer instructions to instruct the relevant hardware, and the program can be stored in the above computer-readable storage medium. When the program is executed, it can include the processes of the above method embodiments. The computer-readable storage medium can be the memory or memory of any of the above embodiments. The above computer-readable storage medium can also be an external storage device of the above positioning device, such as a plug-in hard disk, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, a flash card (flash card), etc. equipped on the above positioning device. Further, the above computer-readable storage medium can also include both the internal storage unit of the above positioning device and an external storage device. The above computer-readable storage medium is used to store the above computer program and other programs and data required by the above positioning device. The above computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
本公开实施例还提供一种计算机程序产品,该计算机产品包含计算机程序,当该计算机程序产品在计算机上运行时,使得该计算机执行上述实施例中所提供的任一项定位方法。The embodiments of the present disclosure also provide a computer program product, which includes a computer program. When the computer program product is run on a computer, the computer is enabled to execute any one of the positioning methods provided in the above embodiments.
尽管在此结合各实施例对本公开进行了描述,然而,在实施所要求保护的本公 开过程中,本领域技术人员通过查看附图、公开内容、以及所附权利要求书,可理解并实现公开实施例的其他变化。在权利要求中,“包括”(Comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。Although the present disclosure is described herein in conjunction with various embodiments, in practicing the claimed invention, During the development process, those skilled in the art may understand and implement other changes to the disclosed embodiments by reviewing the drawings, the disclosed content, and the appended claims. In the claims, the word "comprising" does not exclude other components or steps, and "one" or "an" does not exclude multiple situations. A single processor or other unit can implement several functions listed in the claims. Certain measures are recorded in different dependent claims, but this does not mean that these measures cannot be combined to produce good results.
尽管结合具体特征及其实施例对本公开进行了描述,显而易见的,在不脱离本公开的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本公开的示例性说明,且视为已覆盖本公开范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。Although the present disclosure has been described in conjunction with specific features and embodiments thereof, it is apparent that various modifications and combinations may be made thereto without departing from the spirit and scope of the present disclosure. Accordingly, this specification and the drawings are merely exemplary illustrations of the present disclosure as defined by the appended claims, and are deemed to have covered any and all modifications, variations, combinations or equivalents within the scope of the present disclosure. Obviously, those skilled in the art may make various modifications and variations to the present disclosure without departing from the spirit and scope of the present disclosure. Thus, if these modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is also intended to include these modifications and variations.
以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。 The above are only specific implementations of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the present application should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (19)

  1. 一种定位方法,包括:A positioning method, comprising:
    从服务器获取目标局部点云地图,所述目标局部点云地图为多个局部点云地图中与终端设备的初始位置距离最近的局部点云地图;Acquire a target local point cloud map from a server, wherein the target local point cloud map is a local point cloud map that is closest to an initial position of the terminal device among multiple local point cloud maps;
    在移动入所述目标局部点云地图的覆盖范围之后,基于当前视频帧图像和所述目标局部点云地图,确定所述当前视频帧图像对应的位姿信息。After moving into the coverage of the target local point cloud map, the position and posture information corresponding to the current video frame image is determined based on the current video frame image and the target local point cloud map.
  2. 根据权利要求1所述的方法,其中,每个所述局部点云地图由一个初始局部点云地图经过优化处理后得到;其中,所述优化处理包括以下一项或多项:The method according to claim 1, wherein each of the local point cloud maps is obtained by optimizing an initial local point cloud map; wherein the optimization process comprises one or more of the following:
    去除所述初始局部点云地图中离群的三维点;或,Removing outlier three-dimensional points in the initial local point cloud map; or,
    将所述初始局部点云地图中所述三维点与特征描述子之间一对多个的关系转换为一对一的关系。The one-to-many relationship between the three-dimensional points and the feature descriptors in the initial local point cloud map is converted into a one-to-one relationship.
  3. 根据权利要求2所述的方法,其中,所述初始局部点云地图从点云地图中分割得到,所述点云地图通过以下方式构建:The method according to claim 2, wherein the initial local point cloud map is segmented from a point cloud map, and the point cloud map is constructed by:
    以基于第一特征提取方式的三维重建方法对多帧环境图像进行处理,确定多个三维点中各个三维点的位姿信息和第一特征描述子;Processing the multiple frames of environment images using a three-dimensional reconstruction method based on the first feature extraction method to determine the position and posture information of each three-dimensional point in the multiple three-dimensional points and a first feature descriptor;
    以第二特征提取方式对所述多帧环境图像进行处理,确定多个所述三维点中各个所述三维点的第二特征描述子;其中,所述三维点的第二特征描述子的数据量小于同一三维点的第一特征描述子的数据量;Processing the multiple frames of environmental images in a second feature extraction manner to determine a second feature descriptor for each of the multiple three-dimensional points; wherein the data volume of the second feature descriptor of the three-dimensional point is less than the data volume of the first feature descriptor of the same three-dimensional point;
    基于多个所述三维点中各个所述三维点的位姿信息和第二特征描述子,构建所述点云地图。The point cloud map is constructed based on the position and posture information of each of the three-dimensional points and the second feature descriptor.
  4. 根据权利要求3所述的方法,其中,所述第一特征描述子为尺度不变特征变换SIFT特征描述子,所述第二特征描述子为ORB特征描述子。The method according to claim 3, wherein the first feature descriptor is a scale-invariant feature transform (SIFT) feature descriptor, and the second feature descriptor is an ORB feature descriptor.
  5. 根据权利要求1至4任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 4, wherein the method further comprises:
    向所述服务器发送第一请求信息,所述第一请求信息用于请求所述目标局部点云地图,所述第一请求信息包括用于定位所述初始位置的视频帧图像;Sending a first request message to the server, where the first request message is used to request the target local point cloud map, and the first request message includes a video frame image used to locate the initial position;
    所述从服务器获取目标局部点云地图,包括:The step of obtaining a target local point cloud map from a server includes:
    接收来自于所述服务器的第一响应信息,所述第一响应信息包括所述目标局部点云地图。A first response message is received from the server, where the first response message includes the target local point cloud map.
  6. 根据权利要求1至4任一项所述的方法,其中,所述从服务器获取目标局部点云地图,包括:The method according to any one of claims 1 to 4, wherein obtaining the target local point cloud map from the server comprises:
    接收所述服务器发送的二进制文件格式的所述目标局部点云地图。Receive the target local point cloud map in a binary file format sent by the server.
  7. 根据权利要求1至4任一项所述的方法,其中,所述基于当前视频帧图像和所述目标局部点云地图,确定所述当前视频帧图像对应的位姿信息,包括:The method according to any one of claims 1 to 4, wherein determining the pose information corresponding to the current video frame image based on the current video frame image and the target local point cloud map comprises:
    对所述当前视频帧图像进行特征提取,得到所述当前视频帧图像的特征描述子;Extracting features of the current video frame image to obtain a feature descriptor of the current video frame image;
    对所述当前视频帧图像的特征描述子和所述目标局部点云地图中所述三维点的特征描述子进行特征匹配,确定匹配出来的所述三维点;Performing feature matching on the feature descriptor of the current video frame image and the feature descriptor of the three-dimensional point in the target local point cloud map to determine the matched three-dimensional point;
    基于匹配出来的所述三维点的位姿信息,得到所述当前视频帧图像对应的位姿信息。Based on the matched position and posture information of the three-dimensional point, the position and posture information corresponding to the current video frame image is obtained.
  8. 根据权利要求1至4任一项所述的方法,其中,所述方法还包括: The method according to any one of claims 1 to 4, wherein the method further comprises:
    获取位姿序列,所述位姿序列包括多个时刻的视频帧图像对应的位姿信息;Acquire a posture sequence, wherein the posture sequence includes posture information corresponding to video frame images at multiple moments;
    对所述位姿序列进行插值处理和滤波处理,得到经过插值处理和滤波处理后的位姿序列;Performing interpolation processing and filtering processing on the posture sequence to obtain a posture sequence after interpolation processing and filtering processing;
    基于所述经过插值处理和滤波处理后的位姿序列,显示导航轨迹。Based on the interpolated and filtered pose sequence, a navigation trajectory is displayed.
  9. 根据权利要求8所述的方法,其中,所述基于所述经过插值处理和滤波处理后的位姿序列,显示导航轨迹,包括:The method according to claim 8, wherein displaying the navigation trajectory based on the interpolated and filtered pose sequence comprises:
    基于所述经过插值处理和滤波处理后的位姿序列,通过万维网web浏览器客户端显示所述导航轨迹。Based on the interpolated and filtered pose sequence, the navigation track is displayed through a World Wide Web web browser client.
  10. 一种定位方法,包括:A positioning method, comprising:
    确定终端设备的初始位置;Determine the initial location of the terminal device;
    根据所述终端设备的初始位置,从多个局部点云地图中确定与所述终端设备的初始位置距离最近的目标局部点云地图;According to the initial position of the terminal device, determining a target local point cloud map which is closest to the initial position of the terminal device from a plurality of local point cloud maps;
    向所述终端设备发送所述目标局部点云地图。Send the target local point cloud map to the terminal device.
  11. 根据权利要求10所述的方法,其中,所述方法还包括:The method according to claim 10, wherein the method further comprises:
    对点云地图进行分割,得到多个初始局部点云地图;Segment the point cloud map to obtain multiple initial local point cloud maps;
    对所述多个初始局部点云地图分别进行优化处理,得到所述多个局部点云地图;Optimizing the multiple initial local point cloud maps respectively to obtain the multiple local point cloud maps;
    其中,所述优化处理包括以下一项或多项:The optimization process includes one or more of the following:
    去除所述初始局部点云地图中离群的三维点;或,Removing outlier three-dimensional points in the initial local point cloud map; or,
    将所述初始局部点云地图中所述三维点与特征描述子之间一对多个的关系转换为一对一的关系。The one-to-many relationship between the three-dimensional points and the feature descriptors in the initial local point cloud map is converted into a one-to-one relationship.
  12. 根据权利要求11所述的方法,其中,所述方法还包括:The method according to claim 11, wherein the method further comprises:
    以基于第一特征提取方式的三维重建方法对多帧环境图像进行处理,确定多个所述三维点中各个三维点的位姿信息和第一特征描述子;Processing the multiple frames of environment images using a three-dimensional reconstruction method based on the first feature extraction method to determine the position and posture information of each three-dimensional point in the plurality of three-dimensional points and a first feature descriptor;
    以第二特征提取方式对所述多帧环境图像进行处理,确定多个所述三维点中各个所述三维点的第二特征描述子;其中,所述三维点的第二特征描述子的数据量小于同一三维点的第一特征描述子的数据量;Processing the multiple frames of environmental images in a second feature extraction manner to determine a second feature descriptor for each of the multiple three-dimensional points; wherein the data volume of the second feature descriptor of the three-dimensional point is less than the data volume of the first feature descriptor of the same three-dimensional point;
    基于多个所述三维点中各个所述三维点的位姿信息和第二特征描述子,构建所述点云地图。The point cloud map is constructed based on the position and posture information of each of the three-dimensional points and the second feature descriptor.
  13. 根据权利要求12所述的方法,其中,所述第一特征描述子为尺度不变特征变换SIFT特征描述子,所述第二特征描述子为ORB特征描述子。The method according to claim 12, wherein the first feature descriptor is a scale-invariant feature transform (SIFT) feature descriptor, and the second feature descriptor is an ORB feature descriptor.
  14. 根据权利要求10至13任一项所述的方法,其中,所述确定终端设备的初始位置,包括:The method according to any one of claims 10 to 13, wherein determining the initial position of the terminal device comprises:
    接收来自于所述终端设备的第一请求信息,所述第一请求信息用于请求局部点云地图,所述第一请求信息包括用于定位所述终端设备的初始位置的视频帧图像;Receiving first request information from the terminal device, the first request information is used to request a local point cloud map, and the first request information includes a video frame image used to locate an initial position of the terminal device;
    根据所述用于定位所述终端设备的初始位置的视频帧图像,确定所述终端设备的初始位置。The initial position of the terminal device is determined according to the video frame image used to locate the initial position of the terminal device.
  15. 根据权利要求10至13任一项所述的方法,其中,所述向所述终端设备发送所述目标局部点云地图,包括: The method according to any one of claims 10 to 13, wherein the sending the target local point cloud map to the terminal device comprises:
    向所述终端设备发送二进制文件格式的所述目标局部点云地图。The target local point cloud map is sent to the terminal device in a binary file format.
  16. 根据权利要求10至13任一项所述的方法,其中,所述方法适用于运行Web浏览器服务端的服务器,所述Web浏览器服务端与所述终端设备上运行的Web浏览器客户端之间存在通信连接。The method according to any one of claims 10 to 13, wherein the method is applicable to a server running a Web browser server, and there is a communication connection between the Web browser server and the Web browser client running on the terminal device.
  17. 一种终端设备,包括:处理器和存储器;A terminal device comprises: a processor and a memory;
    所述存储器存储有所述处理器可执行的指令;The memory stores instructions executable by the processor;
    所述处理器被配置为执行所述指令时,使得所述终端设备实现如权利要求1至9中任一项所述的方法。When the processor is configured to execute the instructions, the terminal device implements the method according to any one of claims 1 to 9.
  18. 一种服务器,包括:处理器和存储器;A server comprises: a processor and a memory;
    所述存储器存储有所述处理器可执行的指令;The memory stores instructions executable by the processor;
    所述处理器被配置为执行所述指令时,使得所述服务器实现如权利要求10至16中任一项所述的方法。When the processor is configured to execute the instructions, the server implements the method according to any one of claims 10 to 16.
  19. 一种计算机可读存储介质,所述计算机可读存储介质包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的方法,或,如权利要求10至16中任一项所述的方法。 A computer-readable storage medium, comprising computer instructions, which, when executed on a computer, enable the computer to execute the method according to any one of claims 1 to 9, or the method according to any one of claims 10 to 16.
PCT/CN2023/100387 2022-11-15 2023-06-15 Positioning method, terminal device, server, and storage medium WO2024103708A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211429870.1 2022-11-15
CN202211429870.1A CN118052867A (en) 2022-11-15 2022-11-15 Positioning method, terminal equipment, server and storage medium

Publications (1)

Publication Number Publication Date
WO2024103708A1 true WO2024103708A1 (en) 2024-05-23

Family

ID=91050815

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100387 WO2024103708A1 (en) 2022-11-15 2023-06-15 Positioning method, terminal device, server, and storage medium

Country Status (2)

Country Link
CN (1) CN118052867A (en)
WO (1) WO2024103708A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110561423A (en) * 2019-08-16 2019-12-13 深圳优地科技有限公司 pose transformation method, robot and storage medium
CN112445929A (en) * 2019-08-30 2021-03-05 浙江商汤科技开发有限公司 Visual positioning method and related device
CN113470089A (en) * 2021-07-21 2021-10-01 中国人民解放军国防科技大学 Cross-domain cooperative positioning and mapping method and system based on three-dimensional point cloud
WO2021253430A1 (en) * 2020-06-19 2021-12-23 深圳市大疆创新科技有限公司 Absolute pose determination method, electronic device and mobile platform
WO2022002039A1 (en) * 2020-06-30 2022-01-06 杭州海康机器人技术有限公司 Visual positioning method and device based on visual map
CN115290097A (en) * 2022-09-30 2022-11-04 安徽建筑大学 BIM-based real-time accurate map construction method, terminal and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110561423A (en) * 2019-08-16 2019-12-13 深圳优地科技有限公司 pose transformation method, robot and storage medium
CN112445929A (en) * 2019-08-30 2021-03-05 浙江商汤科技开发有限公司 Visual positioning method and related device
WO2021253430A1 (en) * 2020-06-19 2021-12-23 深圳市大疆创新科技有限公司 Absolute pose determination method, electronic device and mobile platform
WO2022002039A1 (en) * 2020-06-30 2022-01-06 杭州海康机器人技术有限公司 Visual positioning method and device based on visual map
CN113470089A (en) * 2021-07-21 2021-10-01 中国人民解放军国防科技大学 Cross-domain cooperative positioning and mapping method and system based on three-dimensional point cloud
CN115290097A (en) * 2022-09-30 2022-11-04 安徽建筑大学 BIM-based real-time accurate map construction method, terminal and storage medium

Also Published As

Publication number Publication date
CN118052867A (en) 2024-05-17

Similar Documents

Publication Publication Date Title
US12002239B2 (en) Data processing method and device used in virtual scenario
TWI675351B (en) User location location method and device based on augmented reality
CN109100730B (en) Multi-vehicle cooperative rapid map building method
KR101509415B1 (en) Position searching method and apparatus based on electronic map
KR20210036317A (en) Mobile edge computing based visual positioning method and device
TW202208879A (en) Pose determination method, electronic device and computer readable storage medium
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
CN112556685B (en) Navigation route display method and device, storage medium and electronic equipment
CN110296686B (en) Vision-based positioning method, device and equipment
CN111337037B (en) Mobile laser radar slam drawing device and data processing method
JP2021140780A (en) Computer-executed method and device for map creation, electronic apparatus, storage medium, and computer program
WO2017133147A1 (en) Live-action map generation method, pushing method and device for same
CN113029128B (en) Visual navigation method and related device, mobile terminal and storage medium
US11210864B2 (en) Solution for generating virtual reality representation
KR20190127716A (en) Shared Synthetic Reality Experience System and Method Using Digital, Physical, Temporal or Spatial Discovery Service
WO2023065657A1 (en) Map construction method and apparatus, and device, storage medium and program
CN112819860A (en) Visual inertial system initialization method and device, medium and electronic equipment
CN113063421A (en) Navigation method and related device, mobile terminal and computer readable storage medium
US20210334534A1 (en) Method and system for tracking and displaying object trajectory
WO2021088497A1 (en) Virtual object display method, global map update method, and device
CN113483771A (en) Method, device and system for generating live-action map
WO2024103708A1 (en) Positioning method, terminal device, server, and storage medium
WO2023088127A1 (en) Indoor navigation method, server, apparatus and terminal
KR101525936B1 (en) Method and Apparatus for Providing Augmented Reality Service Based on SNS
CN112368731A (en) Data reduction for generating heatmaps