WO2021244114A1 - 视觉定位方法和装置 - Google Patents

视觉定位方法和装置 Download PDF

Info

Publication number
WO2021244114A1
WO2021244114A1 PCT/CN2021/084070 CN2021084070W WO2021244114A1 WO 2021244114 A1 WO2021244114 A1 WO 2021244114A1 CN 2021084070 W CN2021084070 W CN 2021084070W WO 2021244114 A1 WO2021244114 A1 WO 2021244114A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
pose
terminal device
candidate
poses
Prior art date
Application number
PCT/CN2021/084070
Other languages
English (en)
French (fr)
Inventor
冯文森
张欢
曹军
葛建阁
唐忠伟
李江伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21817168.4A priority Critical patent/EP4148379A4/en
Publication of WO2021244114A1 publication Critical patent/WO2021244114A1/zh
Priority to US18/070,862 priority patent/US20230089845A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/005Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 with correlation of navigation data from several sources, e.g. map or contour matching
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3602Input other than that of destination using image analysis, e.g. detection of road signs, lanes, buildings, real preceding vehicles using a camera
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3833Creation or updating of map data characterised by the source of data
    • G01C21/3852Data derived from aerial or satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30184Infrastructure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image

Definitions

  • This application relates to an intelligent perception technology, in particular to a visual positioning method and device.
  • Visual positioning is to use the image or video taken by the camera to locate and accurately locate the position and posture of the camera in the real world. Visual positioning is a hot issue in the field of computer vision in recent years. It is of great significance in many fields such as augmented reality, interactive virtual reality, robot visual navigation, public scene monitoring, and intelligent transportation.
  • Visual positioning technology includes a satellite map-based visual positioning method (Geo-localization). Satellite Map (Satellite Map) is obtained by white-mode reconstruction of the scene through satellites. Based on the visual positioning method of satellite map, use the satellite map (Satellite Map) to locate the image or video taken by the camera, and obtain the 6 degrees of freedom (DoF) pose (Degree of freedom, DoF) of the camera coordinate system in the satellite map. Pose). This type of visual positioning technology can cope with the visual positioning of large-scale scenes.
  • the aforementioned satellite map-based visual positioning method has the problems of low positioning success rate and low positioning accuracy.
  • This application provides a visual positioning method and device to improve the positioning success rate and positioning accuracy.
  • an embodiment of the present application provides a visual positioning method, and the method may include: acquiring an image collected by a terminal device. Obtain the two-dimensional line feature information of the image, where the two-dimensional line feature information includes at least one of the boundary line information between the building and the non-building, or the boundary line information between the non-building and the non-building. According to the position information of the terminal device and the magnetometer angle deflection information, the satellite map, and the two-dimensional line feature information, the positioning pose of the terminal device is determined.
  • two-dimensional line feature information is used for visual positioning.
  • the two-dimensional line feature information may include at least the boundary line information between the building and the non-building or the boundary line information between the non-building and the non-building.
  • acquiring the two-dimensional line feature information of the image may include: semantically segmenting the image, and extracting the two-dimensional line feature information of the image.
  • the two-dimensional line feature information of the image is extracted by means of semantic segmentation, so as to perform visual positioning based on the two-dimensional line feature information, which can improve the success rate and accuracy of visual positioning.
  • determining the positioning pose of the terminal device according to the position information of the terminal device corresponding to the image and the magnetometer angle deflection information, the satellite map, and the two-dimensional line feature information may include: The position information of the device and the angle deflection information of the magnetometer determine the candidate pose set. According to the candidate pose set, the two-dimensional line feature information and the satellite map, N optimized poses are determined. According to the N optimized poses, the positioning pose of the terminal device is determined. Wherein, N is an integer greater than 1.
  • the candidate pose set includes M sets of candidate poses, and each set of candidate poses includes candidate position information and a set of candidate yaw angles.
  • the candidate position information falls within a first threshold range, and the first threshold The range is determined based on the position information of the terminal device, the candidate yaw angle set belongs to a second threshold range, the second threshold range is the angle set determined based on the magnetometer angle deflection information of the terminal device, and M is greater than 1. Integer.
  • determining N optimized poses according to the candidate pose set, the two-dimensional line feature information and the satellite map includes: selecting a part of the candidate poses from the candidate pose set, and using the part The candidate pose and the satellite map determine the panoramic line feature information of the part of the candidate pose, and match the panoramic line feature information with the two-dimensional line feature information, determine multiple initial poses, and perform multiple initial poses. Optimization, determine the N optimized poses.
  • the positioning time can be reduced and the positioning accuracy can be improved.
  • a search method and an iterative method are used to determine N optimized poses, including: Step 1: In the M sets of candidate positions The K 1 group of candidate poses are selected from the poses, and the panoramic line feature information of each group of candidate poses is obtained according to the candidate position information of each group of candidate poses in the K 1 group of candidate poses and the satellite map. Step 2: Match the panoramic line feature information of each group of candidate poses with the two-dimensional line feature information, and determine the candidate yaw angle information of each group of candidate poses.
  • the candidate yaw angle information of each group of candidate poses is The angle with the highest degree of matching with the two-dimensional line feature information in the candidate yaw angle set of each group of candidate poses.
  • Step 3 According to the candidate yaw angle information of the K 1 group of candidate poses, K 1 initial poses are obtained, and each initial pose includes candidate position information and candidate yaw angle information of a group of candidate poses.
  • Step 4 Use an iterative method to optimize the K 1 initial poses to obtain K 1 optimized poses, and obtain the closest point loss corresponding to each optimized pose.
  • Step 5 According to the loss of the closest point of each optimized pose, determine an optimized pose among the K 1 optimized poses, as an optimized pose among the N optimized poses, and the optimized pose is the K 1 optimized pose with the least loss of the nearest point among the optimized poses.
  • the center of the K 1+n sets of candidate poses is an optimized pose determined by performing the above steps 1 to 5 on the K n sets of candidate poses.
  • each initial pose also includes preset altitude information, pitch angle information, and roll angle information
  • each optimized pose includes position information, altitude information, yaw angle information, pitch angle information, and roll ⁇ Angle information.
  • the matching includes multimodal robust matching or two-dimensional contour matching, where the multimodal robust matching includes multiple semantic information matching or maximum value suppression matching.
  • This implementation method can assist in improving the positioning effect through multi-modal robust matching or two-dimensional contour matching.
  • determining the positioning pose of the terminal device according to the N optimized poses includes: among the N optimized poses, selecting the one with the least loss as the positioning position of the terminal device posture.
  • the loss is the weighted sum of the closest point loss of each optimized pose and the difference corresponding to each optimized pose
  • the difference is the difference between the position information of each optimized pose and the position information of the terminal device value.
  • the method may further include: judging whether the positioning pose of the terminal device is reliable according to at least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device.
  • the positioning pose of the terminal device is reliable, the positioning pose of the terminal device is output.
  • the positioning pose of the terminal device is unreliable, it is determined that the positioning has failed.
  • the heat map is used to represent the distribution of the part of the candidate poses.
  • At least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device is used to determine whether the positioning pose of the terminal device is reliable, and the credibility of the positioning result can be improved.
  • judging whether the positioning pose of the terminal device is reliable according to at least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device includes: judging the terminal device’s positioning pose Whether the positioning pose satisfies at least one of the following conditions: the interior point rate corresponding to the positioning pose of the terminal device is greater than the first threshold; or, the interior point error corresponding to the positioning pose of the terminal device is less than the second threshold; or The concentration of the distribution of some candidate poses in the heat map corresponding to the positioning pose of the terminal device is greater than the third threshold.
  • the method further includes: determining the virtual object description information according to the positioning pose of the terminal device. Send the virtual object description information to the terminal device, where the virtual object description information is used to display the corresponding virtual object on the terminal device.
  • an embodiment of the present application provides a visual positioning method.
  • the method may include: a terminal device collects an image, and displays the image on the user interface of the terminal device.
  • the image includes captured non-buildings.
  • Receive the virtual object description information sent by the server the virtual object description information is determined according to the positioning pose of the terminal device that collects the image, and the positioning pose is based on at least the two-dimensional line feature information of the image and the terminal device Determined by the location information, the two-dimensional line feature information includes at least one of the information of the boundary between the building and the non-building, or the information of the boundary between the non-building and the non-building.
  • the virtual object corresponding to the description information of the virtual object is superimposed and displayed on the user interface.
  • the method before the image is collected, the method further includes: displaying prompt information on the user interface, and the prompt information is used to prompt the user to photograph the boundary between the building and the non-building, or the non-building and the non-building. At least one of the dividing lines between non-buildings.
  • the method may further include judging whether the image is suitable for visual positioning through the end-side model.
  • the terminal device performs semantic segmentation on the query image based on the end-side semantic segmentation model, and extracts two-dimensional line features, including the dividing line between buildings and non-buildings, and the difference between different non-buildings.
  • the dividing line between the two-dimensional line judges the richness of the two-dimensional line features. If the two-dimensional line feature is rich, that is, the length of the two-dimensional line feature is greater than a certain threshold, it is suitable for visual determination.
  • the dividing line between the building and the non-building corresponding to the two-dimensional line feature information, or the dividing line between the non-building and the non-building is rich, and if it is rich, then determining the The image is suitable for visual positioning. If it is not rich, it is determined that the image is not suitable for visual positioning.
  • abundance can mean that the length of the aforementioned dividing line is greater than a threshold.
  • the dividing line includes at least one of a dividing line between a building and a non-building corresponding to the two-dimensional line feature information, or a dividing line between a non-building and a non-building.
  • the image can be sent to the server so that the server can visually locate the terminal device based on the image.
  • the accuracy of the two-dimensional line feature information of the image in this implementation is different from the accuracy of the two-dimensional line feature information of the image used in the positioning pose determination described above.
  • the two-dimensional line feature information of the image used for the positioning pose determination is obtained by the server after semantic segmentation of the image, and its accuracy is higher than the two-dimensional line feature information of the image in this implementation manner.
  • the image suitable for visual positioning is sent to the server for further precise visual positioning, which can avoid sending images that are not suitable for visual positioning to the server, causing transmission resources and server-side calculations Waste of resources.
  • an embodiment of the present application provides a visual positioning device, which can be used as a server or an internal chip of the server, and the visual positioning device is used to implement the above-mentioned first aspect or any possible implementation manner of the first aspect
  • the visual positioning method in.
  • the visual positioning device may include a module or unit for executing the visual positioning method in the first aspect or any possible implementation of the first aspect, for example, a transceiver module or unit, and a processing module or unit.
  • an embodiment of the present application provides a visual positioning device that can be used as a server or an internal chip of the server.
  • the visual positioning device includes a memory and a processor.
  • the memory is used to store instructions
  • the processor is used to execute The instructions stored in the memory and the execution of the instructions stored in the memory enable the processor to execute the above-mentioned first aspect or the visual positioning method in any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method in the first aspect or any possible implementation manner of the first aspect is implemented .
  • an embodiment of the present application provides a visual positioning device, which can be used as a terminal device or an internal chip of a terminal device, and the visual positioning device is used to perform the above-mentioned second aspect or any possible aspect of the second aspect.
  • the visual positioning method in the realization mode may include a module or unit for executing the visual positioning method in the second aspect or any possible implementation of the second aspect, for example, a transceiver module or unit, and a processing module or unit.
  • an embodiment of the present application provides a visual positioning device.
  • the communication device can be used as a terminal device or an internal chip of the terminal device.
  • the visual positioning device includes a memory and a processor.
  • the memory is used for storing instructions, and the processor is used for The instructions stored in the memory are executed, and the execution of the instructions stored in the memory causes the processor to execute the second aspect or the visual positioning method in any possible implementation manner of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method in the second aspect or any possible implementation manner of the second aspect is implemented .
  • the embodiments of the present application provide a computer program product.
  • the computer program product includes a computer program.
  • the computer program When the computer program is executed by a computer or a processor, it is used to execute the first aspect or any possible implementation of the first aspect
  • the server obtains the two-dimensional line feature information of the image collected by the terminal device.
  • the two-dimensional line feature information may include the boundary line information between the building and the non-building, or the non-building At least one item of the boundary line information between the object and the non-building is determined according to the position information of the terminal device and the magnetometer angle deflection information, the satellite map, and the two-dimensional line feature information to determine the positioning pose of the terminal device.
  • Using two-dimensional line feature information for visual positioning can solve the problem of positioning failure or low positioning accuracy in scenes with short or insufficient skylines in the field of view, improving the success rate and accuracy of visual positioning, and improving the visual positioning Robustness.
  • FIG. 1 is a schematic diagram of a satellite map provided by an embodiment of this application.
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the application
  • 3A is a schematic diagram of a user interface displayed on a screen of a terminal device according to an embodiment of the application
  • FIG. 3B is a schematic diagram of a user interface displayed on a screen of a terminal device according to an embodiment of the application;
  • FIG. 3C is a schematic diagram of a user interface displayed on a screen of a terminal device according to an embodiment of the application.
  • FIG. 4 is a flowchart of a visual positioning method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of two-dimensional line feature information of an image provided by an embodiment of this application.
  • FIG. 6 is a flowchart of a visual positioning method provided by an embodiment of the application.
  • FIG. 7A is a schematic diagram of a thermal map provided by an embodiment of the application.
  • 7B is a schematic diagram of determining whether the positioning pose of a terminal device is reliable or credible according to an embodiment of the application;
  • FIG. 8A is a flowchart of a robust satellite map-based visual positioning (Geo-localization) method provided by an embodiment of the application;
  • FIG. 8B is a schematic diagram illustrating an example of a dimensional contour matching provided by an embodiment of the application.
  • FIG. 8C is a schematic diagram illustrating an example of a local dimensional contour matching provided by an embodiment of the application.
  • FIG. 8D is a schematic diagram of a positioning result of a different matching method provided by an embodiment of the application.
  • 8E is a schematic diagram of multiple semantic information matching provided by an embodiment of this application.
  • FIG. 8F is a schematic diagram of the comparison between the positioning pose obtained by ICP optimization and the true value according to the embodiment of the application.
  • FIG. 8G is a schematic diagram of the positioning duration of the visual positioning method provided by an embodiment of the application.
  • 8H is a schematic diagram of the positioning accuracy of the visual positioning method provided by an embodiment of the application.
  • FIG. 9A is a schematic diagram of a processing process of a visual positioning method provided by an embodiment of this application.
  • FIG. 9B is a schematic diagram of the processing process of a visual positioning method provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of a user interface provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a visual positioning device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of another visual positioning device provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of another visual positioning device provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of another visual positioning device provided by an embodiment of the application.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships. For example, “A and/or B” can mean: only A, only B, and both A and B. , Where A and B can be singular or plural. The character “/” generally indicates that the associated objects before and after are in an “or” relationship. "The following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
  • Visual Localization In order to seamlessly integrate the real world and the virtual world, the relative conversion relationship between the camera coordinate system of the terminal device and the real world three-dimensional coordinate system is calculated through algorithms to realize the virtual objects in the real world Display in the terminal device.
  • the problem to be solved by the visual positioning technology is how to use the image or video taken by the camera to locate and accurately locate the position and posture of the camera in the real world.
  • Query image RGB image or picture sequence collected by terminal equipment to achieve visual positioning.
  • Satellite Map A map obtained by white-mode reconstruction of the scene (as shown in Figure 1(b)) through satellite pictures (as shown in Figure 1(a)).
  • Geo-localization based on Satellite Map Based on Satellite Map, locate the pose of the terminal device's camera coordinate system in the satellite map.
  • Posture It can include position and posture.
  • the position can include (x, y, z) coordinates
  • the attitude can include angular deflection around three coordinate axes.
  • the angular deflection around the three coordinate axes are yaw angle, pitch angle, and roll. (roll) corner.
  • Including (x, y, z) coordinates, as well as yaw (yaw) angle, pitch (pitch) angle and roll (roll) angle poses can also be called 6 degrees of freedom (Degree of freedom, DoF) pose (Pose).
  • Panoramic line feature information of pose (for example, candidate pose, positioning pose, etc.): Based on the satellite map, extract the panoramic line feature information of the satellite map corresponding to the pose (for example, candidate pose, positioning pose, etc.).
  • the line feature information may include the information of the boundary line between the building and the non-building, or the boundary line information between the non-building and the non-building corresponding to the pose (for example, the candidate pose, the positioning pose, etc.) At least one of them.
  • Terminal devices can be mobile phones, smart phones, tablet personal computers, media players, smart TVs, laptop computers, personal digital assistants (PDAs), personal computers , Smart watches, augmented reality (augmented reality, AR) glasses and other wearable devices (wearable devices), in-vehicle devices, or the Internet of things (IOT) devices, etc., which are not limited in the embodiments of the present application.
  • PDAs personal digital assistants
  • IOT Internet of things
  • FIG. 2 is a schematic diagram of an application scenario provided by an embodiment of the application.
  • the application scenario may include a terminal device 11 and a server 12.
  • the terminal device 11 and the server 12 may communicate with each other, and the server 12
  • the terminal device may be provided with a visual positioning service, and based on the visual positioning service, virtual object description information may be pushed to the terminal device 11 so that the terminal device can present a corresponding virtual object, which may be a virtual road sign, a virtual person, and the like.
  • the embodiments of the present application provide a visual positioning method to improve the success rate and accuracy of visual positioning, so as to accurately push corresponding virtual object description information to the terminal device.
  • a visual positioning method to improve the success rate and accuracy of visual positioning, so as to accurately push corresponding virtual object description information to the terminal device.
  • the visual positioning method of the embodiment of the present application can be applied to fields such as AR navigation, AR human-computer interaction, assisted driving, automatic driving, and the like that need to locate the position and posture of the camera of the terminal device.
  • visual navigation refers to guiding the user to a certain destination through interactive methods such as augmented reality.
  • the user can see the suggested walking direction, distance from the destination and other information on the screen of the terminal device in real time.
  • the virtual object is the walking direction of the J2-1-1B16 meeting room displayed on the screen, that is, passing Augmented reality shows users walking directions, etc.
  • AR game interaction in a super large scene can fix AR content in a specific geographic location.
  • Corresponding virtual objects for example, the virtual character shown in Figure 3B, the virtual animation shown in Figure 3C
  • the user can interact with the virtual object by clicking/swiping the screen of the terminal device, which can guide the virtual object and the real The world interacts.
  • the terminal device 11 is usually provided with a camera, and the terminal device 11 can photograph the scene through the camera.
  • the foregoing server 12 takes one server as an example for illustration, and this application is not limited thereto. For example, it may also be a server cluster including multiple servers.
  • FIG. 4 is a flowchart of a visual positioning method provided by an embodiment of this application.
  • the method in this embodiment involves a terminal device and a server. As shown in FIG. 4, the method in this embodiment may include:
  • Step 101 The terminal device collects an image.
  • the terminal device collects an image through a camera, and the image may be a query image as described above.
  • the smart phone can start the shooting function and collect the image according to the trigger of the application program.
  • images can be collected periodically, for example, 2 seconds, 30 seconds, etc., or when a preset collection condition is met, the preset collection condition can be that the GPS data of the smartphone is within a preset range.
  • One or more images collected by the terminal device can all go through the following steps to achieve visual positioning.
  • the terminal device When the terminal device collects images, it can also collect the position information of the terminal device and the magnetometer angle deflection information.
  • the position information of the terminal device and the angle deflection information of the magnetometer please refer to the relevant explanations of step 104.
  • Step 102 The terminal device sends an image to the server.
  • the server receives the image sent by the terminal device.
  • the terminal device when sending the image, may also send the position information of the terminal device and the magnetometer angle deflection information to the server. In some embodiments, after sending the image, the terminal device may send the position information and the magnetometer angle deflection information of the terminal device corresponding to the image.
  • Step 103 The server obtains the two-dimensional line feature information of the image according to the image.
  • the two-dimensional line feature information may include at least one of the boundary line information between the building and the non-building, or the boundary line information between the non-building and the non-building.
  • the building may include a residence, office building, gymnasium, exhibition hall or hospital, etc.
  • the non-building may include vegetation, sky, water surface (for example, lake, river, or sea surface, etc.), or ground.
  • the boundary line information between the building and the non-building can be the boundary line information between the building and the tree, or the boundary line information between the building and the ground (also called the lower edge of the building). Information), or the boundary information between buildings and the sky (also called the upper edge information of buildings), etc.
  • the boundary information between different non-buildings can be roads and rivers, roads and vegetation, roads and Information on sidewalks or dividing lines between different roads.
  • the boundary line information between non-buildings and road information can also be referred to as road boundary line information.
  • the image is the image shown on the left in FIG.
  • the content of the images may include buildings and/or non-buildings.
  • the two-dimensional line feature information of the images obtained by the server is different.
  • the server can be based on the terminal device Determine the category of the two-dimensional line feature information of the image according to the location information of the terminal device and the angle deflection information of the magnetometer, and then obtain the two-dimensional line feature information of the corresponding category of the image. For example, a user uses a terminal device to collect an image in an urban street and sends it to the server.
  • the server can determine the category of the two-dimensional line feature information of the image according to the location information of the terminal device, and the category of the two-dimensional line feature information Including the boundary information between the building and the non-building and the boundary information between the non-building and the non-building, and then obtaining the boundary information between the building and the non-building and the non-building and the non-building of the image.
  • Information on the dividing line between non-buildings For another example, a user uses a terminal device to collect an image on the Bund River and sends it to the server.
  • the server can determine the category of the two-dimensional line feature information of the image according to the position information of the terminal device and the magnetometer angle deflection information.
  • the category of dimensional line feature information includes the boundary line information between the building and the non-building and the boundary line information between the non-building and the non-building, and then the boundary line between the building and the non-building of the image is obtained. Information and information on the dividing line between non-building and non-building.
  • the server may perform semantic segmentation on the image, and extract the two-dimensional line feature information of the image. For example, semantic segmentation of different categories (for example, vegetation, buildings, sky, water surface, ground, etc.) is performed on the image, and the two-dimensional line feature information of the image is extracted.
  • semantic segmentation of different categories for example, vegetation, buildings, sky, water surface, ground, etc.
  • semantic segmentation is realized through a semantic segmentation model, and the two-dimensional line feature information of the image is output.
  • the semantic segmentation model can be any neural network model, such as a deep neural network (Deep Neural Network, DNN), a convolutional neural network (Convolutional Neural Networks, CNN), or a combination thereof.
  • the semantic segmentation model may also be any machine learning classifier, for example, a support vector machine (SVM) classifier.
  • SVM support vector machine
  • the semantic segmentation model can perform semantic segmentation on the input image, distinguish the outline of the building, sky, vegetation, ground, or water surface, and then output the two-dimensional line feature information of the image.
  • This semantic segmentation can be a dense pixel-level classification task.
  • the semantic segmentation model may be obtained by training using training images and label values (used to represent the categories corresponding to pixels, such as buildings, sky, etc.).
  • the training strategy used during training may be a standard cross-entropy loss, which is used to measure the gap between the predicted value of the semantic segmentation model and the label value, and by minimizing the cross-entropy loss, the prediction of the semantic segmentation model is improved Effect.
  • the semantic segmentation model finally trained can distinguish the dividing line between the building and the non-building in the image, and/or the dividing line between the non-building and the non-building, etc.
  • Step 104 The server determines the positioning pose of the terminal device according to the position information of the terminal device and the magnetometer angle deflection information, the satellite map, and the two-dimensional line feature information.
  • the location information may be the satellite positioning information of the terminal device, for example, it may be the Global Positioning System (GPS) information of the terminal device, the BeiDou Navigation Satellite System (BDS) information of the terminal device, The GLONASS information of the terminal device or the Galileo satellite navigation system information of the terminal device.
  • the magnetometer angle deflection information may be a yaw angle.
  • the position information and the magnetometer angle deflection information may be the position information and the magnetometer angle deflection information when the terminal device collects the image, which may be obtained through the wireless communication module and the magnetometer of the terminal device.
  • the server can use the position information of the terminal device and the magnetometer angle deflection information to determine multiple candidate poses, and based on the satellite map, extract the panoramic line feature information of each candidate pose, and according to the panoramic line feature information of each candidate pose and The two-dimensional line feature information determines the positioning pose of the terminal device, so that the two-dimensional line feature information of the image can be combined with the two-dimensional line feature information to determine the positioning pose of the terminal device, thereby improving the positioning success rate and positioning accuracy.
  • the server may determine the candidate pose set according to the position information of the terminal device and the angle deflection information of the magnetometer.
  • the candidate pose set may include multiple sets of candidate poses, each set of candidate poses includes candidate position information and a set of candidate yaw angles.
  • the candidate position information is determined according to the position information of the terminal device.
  • the position information of the terminal device determines the candidate position range, and the candidate position information belongs to the candidate position range.
  • the candidate position range may be a circular area with a certain radius (for example, 30 meters) centered on the position information of the terminal device.
  • the server may determine a set of candidate yaw angles according to the magnetometer angle deflection information of the terminal device.
  • the set of candidate yaw angles may be within the range of plus or minus 90 degrees of the magnetometer angle deflection information of the terminal device.
  • the server may determine N optimized poses according to the candidate pose set, the two-dimensional line feature information and the satellite map. According to the N optimized poses, the positioning pose of the terminal device is determined. Wherein, N is an integer greater than 1.
  • the server can determine N optimized poses based on the candidate pose set, the two-dimensional line feature information and the satellite map, using a search method and an iterative method.
  • the search method is used to select part of the candidate poses from the set of candidate poses, use the part of candidate poses and the satellite map to determine the panoramic line feature information of the partial candidate poses, and compare the panoramic line feature information with the The two-dimensional line feature information is matched to determine multiple initial poses, and the iterative method is used to optimize the multiple initial poses to determine the N optimized poses.
  • the search method can select the panoramic line feature information of some candidate poses to match the two-dimensional line feature information, thereby reducing the time required to determine the positioning pose of the terminal device, that is, reducing the positioning time.
  • the search method may perform multiple searches in the candidate pose set to determine the N optimized poses. Exemplarily, N searches are performed to determine the N optimized poses.
  • the server can select the panoramic line feature information of some candidate poses from the candidate pose set to match the two-dimensional line feature information in the first search. And use the iterative method to determine an optimized pose.
  • the server selects the panoramic line feature information and two-dimensional feature information of some candidate poses near the optimized pose determined in the first search from the candidate pose set. Line feature information is matched, and the iterative method is used to determine an optimized pose, and so on, the search method and the iterative method are repeated until N optimized poses are determined.
  • the iterative method may be Iterative Closest Points Algorithm (ICP). This iterative method can optimize the initial pose obtained by matching to obtain an optimized pose, thereby improving the accuracy of the final positioning pose.
  • ICP Iterative Closest Points Algorithm
  • the server may select an optimized pose with the smallest loss among the N optimized poses as the positioning pose of the terminal device.
  • the loss includes the difference between the position information of each optimized pose and the position information of the terminal device, and the nearest point loss (ICP loss) of each optimized pose.
  • the difference between the position information of the optimized pose and the position information of the terminal device, and the nearest point loss of the optimized pose can be obtained by a weighted summation method.
  • Loss of pose For example, the loss of the optimized pose is equal to a1 * the difference between the position information of the optimized pose and the position information of the terminal device + a2 * the closest point loss of the optimized pose.
  • the specific values of a1 and a2 can be flexibly set according to requirements.
  • the closest point loss corresponding to the optimized pose is the closest point loss obtained by matching the panoramic line feature information of the optimized pose extracted based on the satellite map and the two-dimensional line feature information.
  • Step 105a The server determines the description information of the virtual object according to the positioning pose of the terminal device.
  • the server may determine the virtual object description information according to the positioning pose, and the virtual object description information is used to display the corresponding virtual object on the terminal device, for example, the walking guide icon shown in FIG. 3A, which is displayed in the real world In the actual scene, it is displayed on the street as shown in Figure 3A.
  • Step 105b The server sends the description information of the virtual object to the terminal device.
  • Step 106 The terminal device displays the virtual object corresponding to the virtual object description information on the user interface.
  • the terminal device displays the virtual object corresponding to the description information of the virtual object on the user interface, the user interface displays the actual scene of the real world, and the virtual object can be displayed on the user interface in an augmented reality manner.
  • the server obtains the two-dimensional line feature information of the image collected by the terminal device.
  • the two-dimensional line feature information may include the boundary line information between the building and the non-building or the line between the non-building and the non-building. At least one item of the boundary line information determines the positioning pose of the terminal device according to the position information of the terminal device and the angle deflection information of the magnetometer, the satellite map, and the two-dimensional line feature information.
  • Using two-dimensional line feature information for visual positioning can solve the problem of positioning failure or low positioning accuracy in scenes with short or insufficient skylines in the field of view, improving the success rate and accuracy of visual positioning, and improving the visual positioning Robustness.
  • FIG. 6 is a flowchart of a visual positioning method provided by an embodiment of the application.
  • the method in this embodiment involves a terminal device and a server. This embodiment is based on the embodiment shown in FIG. 4 and determines the positioning position of the terminal device. After the pose, it is further determined whether the positioning pose is reliable, so as to improve the credibility of the positioning result.
  • the method of this embodiment may include:
  • Step 201 The terminal device collects an image.
  • Step 202 The terminal device sends an image to the server.
  • Step 203 The server obtains the two-dimensional line feature information of the image according to the image.
  • steps 201 to 203 can refer to steps 101 to 103 in the embodiment shown in FIG. 4, and details are not described herein again.
  • Step 2041 the server determines the candidate pose set according to the position information of the terminal device and the magnetometer angle deflection information.
  • Step 2042 the server uses a search method and an iterative method to determine N optimized poses according to the candidate pose set, the two-dimensional line feature information and the satellite map.
  • the search method is used to select part of the candidate poses from the set of candidate poses to match the two-dimensional line feature information to determine multiple initial poses, and the iterative method is used to optimize the multiple initial poses , Determine the N optimized poses.
  • Step 2043 The server determines the positioning pose of the terminal device according to the N optimized poses.
  • steps 2041 to 2042 reference may be made to step 104 of the embodiment shown in FIG. 4, which will not be repeated here.
  • Step 205 The server determines whether the positioning pose of the terminal device is reliable according to at least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device. If it is reliable, execute step 206a. If it is reliable, go to step 208.
  • the heat map is used to represent the distribution of the position information of the N optimized poses in the candidate position set.
  • the interior point rate and the interior point error are used to describe the degree of matching between the panoramic line feature information of the positioning pose of the terminal device and the two-dimensional line feature information based on satellite positioning.
  • the interior point refers to the point where the difference between the two-dimensional line feature information and the panoramic line feature information of the positioning pose of the terminal device is less than L1, and the L1 can be any positive value less than 10, or 5, or 4. Integer.
  • the interior point ratio refers to the proportion of the total number of points whose difference value is less than L1 in the total number of all points of the two-dimensional line feature information.
  • the interior point error refers to the mean value of the difference of points whose difference is less than L1.
  • an embodiment of the present application provides a schematic diagram of a heat map. As shown in FIG. 7A, the center of the heat map is the point where the position information of the terminal device is located. Each solid square dot in the figure represents the use of the above search method and Points processed by the iterative method.
  • the server may determine whether the positioning pose of the terminal device satisfies at least one of the following conditions: the interior point rate corresponding to the positioning pose of the terminal device is greater than a first threshold; or, the positioning pose of the terminal device The corresponding interior point error is less than the second threshold; or, the intensity of the distribution of the candidate pose in the heat map corresponding to the positioning pose of the terminal device is greater than the third threshold.
  • the values of the first threshold, the second threshold, and the third threshold can be any positive numbers, which can be flexibly set according to requirements.
  • the interior point rate corresponding to the positioning pose of the terminal device is greater than the first threshold; or, the interior point error corresponding to the positioning pose of the terminal device is less than the second threshold.
  • the dimensional line feature information is relatively similar and matched.
  • the intensity of the distribution of the candidate poses in the heat map corresponding to the positioning pose of the terminal device is greater than the third threshold, which can indicate that the candidate poses selected by the search method in the candidate pose set are relatively concentrated, so that multiple searches are finally determined
  • the positioning pose is more accurate. Taking the heat map shown in FIG. 7A as a further example, it can be seen from FIG. 7A that the distribution of candidate poses in the heat map is relatively concentrated, and it can be determined that the positioning pose is reliable.
  • the positioning pose of the terminal device is reliable or credible, on the contrary, the interior point rate of the positioning pose of the terminal device is low , And/or the internal point error is large, then the positioning pose of the terminal device is unreliable or unreliable.
  • the position and pose of the terminal device corresponds to the position and pose of the terminal device in the heat map corresponding to the concentration of the candidate pose, then the position and pose of the terminal device is reliable or credible, otherwise, the position and pose of the terminal device corresponds to the candidate If the distribution of the pose is scattered, the positioning pose of the terminal device is unreliable or unreliable.
  • FIG. 7B Taking the server determining whether the positioning pose of the terminal device is reliable or credible through the interior point rate, interior point error, and heat map as an example, see Figure 7B.
  • the image shown on the left in Figure 7B is the image collected by the terminal device.
  • the image shown in the middle is the panoramic line feature information of the positioning pose of the terminal device and the two-dimensional line feature information of the image, that is, each dividing line.
  • Step 206a The server determines the description information of the virtual object according to the positioning pose of the terminal device.
  • step 206a After performing step 206a, step 206b and step 207 may be performed.
  • Step 206b The server sends the description information of the virtual object to the terminal device.
  • Step 207 The terminal device displays the virtual object corresponding to the description information of the virtual object on the user interface.
  • Step 208 The server sends a prompt message to the terminal device, where the prompt message is used to indicate a positioning failure.
  • the prompt message is also used to instruct to reacquire images.
  • Step 209 The terminal device displays the positioning failure on the user interface.
  • the terminal device can also display information on the user interface that prompts the user to reacquire the image.
  • the server obtains the two-dimensional line feature information of the image collected by the terminal device.
  • the two-dimensional line feature information may include the boundary line information between the building and the non-building or the line between the non-building and the non-building.
  • At least one item of demarcation line information based on the position information of the terminal device and the magnetometer angle deflection information, the satellite map, and the two-dimensional line feature information, determine the positioning pose of the terminal device, and use the positioning pose of the terminal device Corresponding to at least one of the interior point rate, interior point error, or heat map, determine whether the positioning pose of the terminal device is reliable, and output the positioning pose of the terminal device when it is reliable, and determine that the positioning fails when it is unreliable.
  • Using two-dimensional line feature information for visual positioning can solve the problem of positioning failure or low positioning accuracy in scenes with short or insufficient skylines in the field of view, improving the success rate and accuracy of visual positioning, and improving the visual positioning Robustness. Further, at least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device is used to determine whether the positioning pose of the terminal device is reliable, which can improve the credibility of the positioning result.
  • FIG. 8A is a flowchart of a robust satellite map-based visual positioning (Geo-localization) method provided by an embodiment of this application.
  • the execution subject of this embodiment may be a server or an internal chip of the server, as shown in FIG. 8A ,
  • the method of this embodiment may include:
  • Step 301 Determine M sets of candidate poses according to the position information of the terminal device corresponding to the image and the angle deflection information of the magnetometer.
  • Each group of candidate poses includes candidate position information and a set of candidate yaw angles.
  • the candidate position information belongs to a first threshold range, the first threshold range is determined according to the position information of the terminal device, and the set of candidate yaw angles belongs to Within the second threshold range, the second threshold range is an angle set determined according to the magnetometer angle deflection information of the terminal device.
  • M is a positive integer greater than 1.
  • the terminal device may separately construct a candidate position set (T) and a candidate yaw angle set (Y) according to the position information of the terminal device corresponding to the image and the magnetometer angle deflection information, and the candidate position set (T) Including multiple candidate position information, the candidate yaw angle set (Y) includes multiple yaw angles, one candidate position information in T and the candidate yaw angle set (Y) can form one Group candidate poses, so that multiple sets of candidate poses can be formed.
  • One possible way of constructing the candidate position set (T) is to select a position point as the candidate position information in the candidate position set (T) within a range of a region with a first preset interval as the interval.
  • the region range can be
  • the position information (x, y) of the terminal device corresponding to the image is the center of the circle, and the radius is the range of the fourth threshold. That is, the center value of the above-mentioned first threshold value range is the location information of the terminal device.
  • the fourth threshold may be 30 meters, 35 meters, and so on.
  • the first preset interval may be 1 meter.
  • the yaw angle of the image can be a range of plus or minus the fifth threshold of the yaw angle of the terminal device corresponding to the image. That is, the center value of the above-mentioned second threshold value range is the magnetometer angle deflection information of the terminal device.
  • the fifth threshold may be 90 degrees, 85 degrees, and so on.
  • the second preset interval may be 0.1 degree.
  • Step 302 Select K 1 candidate poses from the M candidate poses, and obtain a panoramic view of each candidate pose according to the candidate position information and satellite map of each candidate pose in the K 1 candidate poses. Line feature information.
  • the embodiment of the present application may select K 1 sets of candidate poses from M sets of candidate poses for matching.
  • the selection method of the K 1 sets of candidate poses may be to select K 1 sets of candidate poses at intervals based on the candidate position information among the M sets of candidate poses. For example, the candidate position information of two adjacent candidate positions in the K 1 group of candidate poses is separated by 3 meters.
  • Step 303 Match the panoramic line feature information of each group of candidate poses with the two-dimensional line feature information, and determine the candidate yaw angle information of each group of candidate poses.
  • the candidate yaw angle information of each group of candidate poses is the angle with the highest degree of matching with the two-dimensional line feature information in the candidate yaw angle set of each group of candidate poses.
  • the candidate yaw angle information of the set of candidate poses is determined, that is, a yaw angle is determined.
  • the sliding window can be used to traverse the panoramic line feature information and the two-dimensional line feature information of the matching candidate poses.
  • the matching may include multimodal robust matching or two-dimensional contour matching, where the multimodal robust matching includes multiple semantic information matching or maximum value suppression matching.
  • Figure 8B an example of two-dimensional contour matching can be seen in Figure 8B, where (a) in Figure 8B is a schematic diagram of the matching process of an image, that is, the panoramic line feature information (the lighter and longer line in the figure) and The two-dimensional line feature information of the image (the darker and shorter line in the figure) is matched. (b) in FIG. 8B is a schematic diagram of the matching process of another image. The matching principle is the same and will not be repeated here.
  • FIG. 8C An example of local two-dimensional contour matching can be found in Fig. 8C, where the matching diagram on the left in Fig. 8C is a matching method in the prior art, that is, matching is performed in a manner that the distance of the closest point is the vertical distance. This matching method has a large error.
  • the matching method of the embodiment of the present application can be shown in the matching diagram on the right in Figure 8C, that is, the closest point distance is the horizontal distance for matching. This matching method can make the visual positioning more accurate .
  • FIG. 8D is the positioning result corresponding to the matching method of FIG. 8C in an embodiment of the application.
  • the first row in FIG. 8D uses the matching method on the left in FIG. 8C to process the positioning results of the original image on the left in FIG. 8D
  • the second row in FIG. 8D uses the matching method on the right in FIG. 8C to match
  • the embodiment of the present application combines the boundary line information between the building and the non-building and the boundary line information between the non-building and the non-building for matching, which can effectively improve the positioning discrimination.
  • Figure 8E For a schematic of multiple semantic information matching, please refer to Figure 8E, as shown in Figure 8E, where the dividing line between the tree and the sky in the image must be higher than the dividing line between the building and the sky in the map encoding, and the upper boundary of the building in the image exceeds The upper boundary of the image means that it must be lower than the boundary between the building and the sky in the map encoding.
  • the optimization method is: if the error of the two-dimensional line feature information exceeds a certain threshold, it is suppressed to the threshold.
  • Step 304 Obtain K 1 initial poses according to the candidate yaw angle information of the K 1 group of candidate poses, and each initial pose includes candidate position information and candidate yaw angle information of a group of candidate poses.
  • the candidate yaw angle information of a group of candidate poses determined in step 303 and the candidate pose information of the group of candidate poses are combined to form an initial pose. If there are K 1 sets of candidate poses, K 1 initial poses can be obtained through matching processing.
  • Step 305 Use an iterative method to optimize the K 1 initial poses to obtain K 1 optimized poses, and obtain the closest point loss corresponding to each optimized pose.
  • the iterative method optimization can be ICP as described above. That is, ICP optimization is used for each initial pose to obtain an optimized pose. For example, ICP is used to optimize the yaw angle of the initial pose.
  • each initial pose may also include preset height information, pitch angle information (pitch), and roll angle information (roll).
  • the preset height information may be 1.5m or the like.
  • the pitch angle information (pitch) and roll angle information (roll) can be given by a Simultaneous Localization and Mapping (Simultaneous Localization and Mapping, SLAM) algorithm. Since the pitch angle information (pitch) and roll angle information (roll (roll)) given by SLAM will have some errors, the use of ICP can further optimize the pitch angle information (pitch (pitch)) and roll angle information (roll (roll) roll)))
  • Each optimized pose includes position information, altitude information, magnetometer angle information (optimized yaw angle), pitch angle information (optimized pitch (pitch)), and roll angle information (optimized roll (roll)).
  • the optimization method is: extract the two-dimensional line feature of the image (also called query image) and the point on the line feature in the coding library, map it to the unit sphere, and treat it as two sets of point clouds; then use ICP to pair Point cloud for matching.
  • the output of ICP is three angles of pitch, yaw and roll; these three angles are regarded as the final output angle (that is, the above-mentioned optimized pitch angle, optimized yaw angle, and optimized roll angle), instead of using The angle given by SLAM.
  • Figure 8F See Figure 8F, where (a) in Figure 8F is the original image, that is, the image in the above embodiment, and (b) in Figure 8F is the boundary line corresponding to the positioning pose obtained without ICP optimization, and The boundary line corresponding to the ground truth can be seen from the boundary line between the sky and the building in (b) in FIG. 8F. The difference between the two is relatively large. (c) in FIG. The boundary line corresponding to the positioning pose obtained by ICP optimization and the boundary line corresponding to the ground truth can be seen from the boundary line between the sky and the building in (c) in Figure 8F. The difference between the two is small .
  • Step 306 According to the loss of the closest point of each optimized pose, determine an optimized pose among the K 1 optimized poses, as one of the N optimized poses, and the one optimized pose is the K 1 optimized pose with the least loss of the nearest point among the optimized poses.
  • the closest point loss may be the matching degree corresponding to the optimized pose in the loss in the embodiment shown in FIG. 4.
  • Step 307 Determine whether to determine N optimized poses, if not, replace K 1 with K 1+n , repeat steps 302 to 307, and if yes, execute step 308 above.
  • the center of the K 1+n sets of candidate poses is an optimized pose determined by performing the above steps 302 to 307 on the K n sets of candidate poses. That is, the optimized pose after one search and optimization can be used to determine multiple sets of candidate poses for the next search and optimization. For example, the candidate poses around the optimized pose after one search and optimization are selected for the next search and optimization.
  • Step 308 Determine the positioning pose of the terminal device according to the N optimized poses.
  • an optimized pose with the smallest loss is selected as the positioning pose of the terminal device.
  • the loss includes the difference between the position information of the optimized pose and the position information of the terminal device, and the matching degree corresponding to the optimized pose.
  • two-dimensional line feature information in the process of visual positioning, can be used to solve the problem of positioning failure or low positioning accuracy in scenes with short or insufficient skylines in the field of view, and improve the success rate of visual positioning and Accuracy, and can improve the robustness of visual positioning.
  • the search method and the iterative closest point method can reduce the positioning time and improve the positioning accuracy.
  • Figure 8G shows the visual positioning duration of the embodiment of the present application and the positioning duration of the prior art.
  • the visual positioning method of the application embodiment can reduce the positioning time.
  • Figure 8H shows the visual positioning accuracy of the embodiment of the present application and the positioning accuracy of the prior art.
  • the visual positioning method of the embodiment of the present application has different positioning errors, for example, 1 meter 1 degree ° ), 2 meters 2 degrees (2m2 ° ), etc., the positioning accuracy is higher than the positioning accuracy of the prior art.
  • FIG. 9A is a schematic diagram of the processing procedure of a visual positioning method provided by an embodiment of this application.
  • the method of this embodiment may include: a terminal device collects an image and the position information of the terminal device and the angle deflection of the magnetometer Information (S501).
  • the server obtains the image and the position information of the terminal device and the angle deflection information of the magnetometer.
  • the server performs semantic segmentation on the image (S502), and extracts the two-dimensional line feature information of the image based on the result of the semantic segmentation (S503).
  • the server determines M sets of candidate poses based on the position information of the terminal device and the magnetometer angle deflection information (S504).
  • the server selects some candidate poses from the M sets of candidate poses through a search method to perform candidate processing steps (S505).
  • the server extracts the panoramic line feature information of each group of candidate poses from the satellite map according to part of the candidate poses (S506).
  • the panoramic line feature information of each group of candidate poses is matched with the two-dimensional line feature information, and the candidate yaw angle information of each group of candidate poses is determined to obtain multiple initial poses (S507).
  • the multiple initial poses are optimized by an iterative method to obtain multiple optimized poses (S508).
  • One optimized pose is determined among the multiple optimized poses as one of the N optimized poses. Repeat (S505) to (S508) to determine N optimized poses.
  • an optimized pose with the smallest loss is selected as the positioning pose of the terminal device (S509).
  • the server makes a confidence determination (S510), that is, according to at least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device, to determine whether the positioning pose of the terminal device is reliable, and when it can be reliable Output the positioning pose.
  • the visual positioning method may further include the terminal device performing pre-detection processing (S511).
  • the dotted line in FIG. 9A indicates optional.
  • One possible implementation of the pre-detection processing is: before sending the image, judging whether the image is suitable for visual positioning through the end-side model.
  • the terminal device performs semantic segmentation on the query image based on the end-side semantic segmentation model, and extracts two-dimensional line features, including the dividing line between buildings and non-buildings, and the difference between different non-buildings.
  • the dividing line between the two-dimensional line judges the richness of the two-dimensional line features. If the two-dimensional line feature is rich, that is, the length of the two-dimensional line feature is greater than a certain threshold, it is suitable for visual positioning.
  • FIG. 9B is a schematic diagram of the processing procedure of a visual positioning method provided by an embodiment of this application.
  • the execution subject of this embodiment may be the terminal device or the processor of the terminal device.
  • Step 601 The terminal device collects images, the position information of the terminal device, and the angle deflection information of the magnetometer.
  • Step 602 The terminal device judges whether the image is suitable for visual positioning through the end-side model, if yes, execute step S603, if not, execute step 601.
  • Step 603 The terminal device sends the image, the position information of the terminal device, and the magnetometer angle deflection information to the server.
  • the dividing line between the building and the non-building corresponding to the two-dimensional line feature information, or the dividing line between the non-building and the non-building is rich, and if it is rich, then determining the The image is suitable for visual positioning. If it is not rich, it is determined that the image is not suitable for visual positioning.
  • abundance can mean that the length of the aforementioned dividing line is greater than a threshold.
  • the dividing line includes at least one of a dividing line between a building and a non-building corresponding to the two-dimensional line feature information, or a dividing line between a non-building and a non-building.
  • the image can be sent to the server so that the server can visually locate the terminal device based on the image.
  • the end-side model in this embodiment is used to implement semantic segmentation and output the two-dimensional line feature information of the image.
  • the end-to-side model can be any neural network model, such as a deep neural network (Deep Neural Network, DNN), a convolutional neural network (Convolutional Neural Networks, CNN), or a combination thereof.
  • the semantic segmentation model may also be any machine learning classifier, for example, a support vector machine (SVM) classifier.
  • SVM support vector machine
  • the accuracy of the two-dimensional line feature information of the image in the pre-detection process is different from the accuracy of the two-dimensional line feature information of the image used in the positioning pose determination described above.
  • the two-dimensional line feature information of the image used for the positioning pose determination is obtained by the server after semantic segmentation of the image, and its accuracy is higher than the two-dimensional line feature information of the image in the pre-detection process.
  • the semantic segmentation model of the server performs fine semantic segmentation of different categories (vegetation, buildings, sky, etc.) for query images.
  • the semantic segmentation model here is larger than the end-side model used in the pre-detection processing in the terminal device, and the segmentation accuracy is higher than that of the end-side model.
  • the server can determine the positioning pose of the terminal device through the steps of the above embodiment, and return the virtual object description information to the terminal device for display on the user interface of the terminal device The corresponding virtual object.
  • the image suitable for visual positioning is sent to the server for further precise visual positioning, which can avoid sending images that are not suitable for visual positioning to the server, causing transmission resources and server-side calculations Waste of resources.
  • FIG. 10 is a schematic diagram of a user interface provided by an embodiment of the application. As shown in FIG. 10, the user interface 901-the user interface 902 are included.
  • the terminal device may collect an image, and the image is presented in the user interface 901.
  • the user interface 901 may be a user interface of an application program.
  • the application program may be an application program used to provide AR navigation services.
  • the user may click the icon of the application program, and in response to the click operation, the terminal device may The user interface 901 is displayed, and the image is displayed in the user interface 901.
  • prompt information can also be displayed in the user interface 901.
  • the prompt information is used to prompt the user to photograph the boundary between a building and a non-building, or a non-building At least one of the dividing lines between non-buildings and non-buildings.
  • the prompt message may be "please shoot as many scenes as possible: dividing lines between vegetation and buildings, dividing lines between roads and buildings, etc.”.
  • the image in the user interface 901 includes the dividing line between the building and the vegetation, the dividing line between the vegetation and the road, the dividing line between the building and the road, the dividing line between the building and the sky, and the dividing line between the vegetation and the sky.
  • the dividing line between so it can meet the needs of visual positioning.
  • the terminal device can send the image to the server through the above step 102.
  • the server may determine the positioning pose of the terminal device through the above steps 103 to 104, and send the virtual object description information corresponding to the positioning pose to the terminal device through step 105.
  • the terminal device can display a user interface 902 according to the virtual object description information.
  • the user interface 902 presents a virtual object corresponding to the virtual object description information, for example, a guide icon of a cafe.
  • two-dimensional line feature information in the process of visual positioning, can be used to solve the problem of positioning failure or low positioning accuracy in scenes with short or insufficient skylines in the field of view, and improve the success rate of visual positioning and Accuracy, and can improve the robustness of visual positioning.
  • the search method and the iterative closest point method can reduce the positioning time and improve the positioning accuracy.
  • the virtual object description information is pushed to the terminal device, so that the terminal device presents the virtual object corresponding to the virtual object description information on the user interface, so that the visual positioning method of the embodiment of this application can be applied to AR navigation, AR human-computer interaction, assisted driving, autonomous driving, and other fields that need to locate the position and posture of the camera of the terminal device, enhance the user experience.
  • the embodiment of the present application also provides a visual positioning device for executing the method steps executed by the server or the processor of the server in the above method embodiments.
  • the visual positioning device may include: a transceiver module 111 and a processing module 112.
  • the processing module 112 is configured to obtain the image collected by the terminal device through the transceiver module 111.
  • the processing module 112 is also used to obtain the two-dimensional line feature information of the image according to the image.
  • the two-dimensional line feature information includes the boundary line information between the building and the non-building, or the difference between the non-building and the non-building. At least one item in the information on the boundary between the two.
  • the processing module 112 is also used to determine the positioning pose of the terminal device according to the position information of the terminal device and the angle deflection information of the magnetometer, the satellite map, and the two-dimensional line feature information.
  • the processing module 112 is configured to: perform semantic segmentation on the image and extract the two-dimensional line feature information of the image.
  • the processing module 112 is configured to determine a candidate pose set according to the position information of the terminal device and the magnetometer angle deflection information. According to the candidate pose set, the two-dimensional line feature information and the satellite map, N optimized poses are determined. According to the N optimized poses, the positioning pose of the terminal device is determined. Wherein, N is an integer greater than 1.
  • the processing module 112 is configured to: select a part of the candidate poses from the candidate pose set, use the part of the candidate poses and the satellite map to determine the panoramic line feature information corresponding to the part of the candidate poses, and The panoramic line feature information is matched with the two-dimensional line feature information to determine multiple initial poses, and the iterative method is used to optimize the multiple initial poses and determine the N optimized poses.
  • the candidate pose set includes M sets of candidate poses, each set of candidate poses includes candidate position information and a set of candidate yaw angles, the candidate position information falls within a first threshold range, and the first threshold range
  • the candidate yaw angle set belongs to a second threshold range, and the second threshold range is an angle set determined according to the magnetometer angle deflection information of the terminal device, and the processing module is used for :
  • Step 1 Select K 1 candidate poses from the M candidate poses, and obtain the panoramic line of each candidate pose according to the candidate position information and satellite map of each candidate pose in the K 1 candidate poses.
  • Step 2 Match the panoramic line feature information of each group of candidate poses with the two-dimensional line feature information, and determine the candidate yaw angle information of each group of candidate poses.
  • the candidate yaw angle information of each group of candidate poses is each The angle with the highest degree of matching with the two-dimensional line feature information in the candidate yaw angle set of the candidate poses;
  • Step 3 According to the candidate yaw angle information of the K 1 group of candidate poses, K 1 initial poses are obtained, and each initial pose includes candidate position information and candidate yaw angle information of a set of candidate poses;
  • Step 4 Use an iterative method to optimize K 1 initial poses to obtain K 1 optimized poses, and obtain the nearest point loss corresponding to each optimized pose;
  • Step 5 According to the loss of the closest point of each optimized pose, determine an optimized pose among the K 1 optimized poses, as one of the N optimized poses, and the optimized pose is the K 1 An optimized pose with the least loss of the nearest point among the optimized poses;
  • the center of the K 1+n sets of candidate poses is an optimized pose determined by performing the above steps 1 to 5 on the K n sets of candidate poses.
  • each initial pose also includes preset altitude information, pitch angle information, and roll angle information
  • each optimized pose includes position information, altitude information, yaw angle information, pitch angle information, and roll angle. information.
  • the matching includes multimodal robust matching or two-dimensional contour matching, wherein the multimodal robust matching includes multiple semantic information matching or maximum value suppression matching.
  • the processing module 112 is configured to: among the N optimized poses, select an optimized pose with the least loss as the positioning pose of the terminal device.
  • the loss is the difference value corresponding to each optimized pose and the weighted sum of the loss of the nearest point of each optimized pose
  • the difference is the difference between the position information of each optimized pose and the position information of the terminal device The difference.
  • the processing module 112 is further configured to determine whether the positioning pose of the terminal device is reliable according to at least one of the interior point rate, interior point error, or heat map corresponding to the positioning pose of the terminal device.
  • the positioning pose of the terminal device is reliable, the positioning pose of the terminal device is output.
  • the positioning pose of the terminal device is unreliable, it is determined that the positioning has failed.
  • the heat map is used to represent the distribution of the part of the candidate poses.
  • the processing module 112 is used to determine whether the positioning pose of the terminal device satisfies at least one of the following conditions: the interior point rate corresponding to the positioning pose of the terminal device is greater than a first threshold; or, the terminal device The interior point error corresponding to the positioning pose of the device is less than the second threshold; or, the concentration of the distribution of the part of the candidate poses in the heat map corresponding to the positioning pose of the terminal device is greater than the third threshold.
  • the processing module 112 is further configured to determine the description information of the virtual object according to the positioning pose of the terminal device.
  • the virtual object description information is sent to the terminal device through the transceiver module 111, and the virtual object description information is used to display the corresponding virtual object on the terminal device.
  • the visual positioning device provided in the embodiment of the present application can be used to execute the above-mentioned visual positioning method, and its content and effects can be referred to the method part, which will not be repeated in the embodiment of the present application.
  • the embodiment of the present application also provides a visual positioning device.
  • the visual positioning device includes a processor 1201 and a transmission interface 1202, and the transmission interface 1202 is used to obtain an image collected by a terminal device.
  • the transmission interface 1202 may include a transmission interface and a reception interface.
  • the transmission interface 1202 may be any type of interface according to any proprietary or standardized interface protocol, such as high definition multimedia interface (HDMI), mobile Industrial processor interface (Mobile Industry Processor Interface, MIPI), MIPI standardized display serial interface (Display Serial Interface, DSI), Video Electronics Standards Association (Video Electronics Standards Association, VESA) standardized embedded display port (Embedded Display Port) , EDP), Display Port (DP) or V-By-One interface.
  • HDMI high definition multimedia interface
  • MIPI Mobile Industrial processor interface
  • MIPI MIPI standardized display serial interface
  • DSI Display Serial Interface
  • Video Electronics Standards Association Video Electronics Standards Association
  • VESA Video Electronics Standards Association
  • EDP embedded Display Port
  • DP Display Port
  • V-By-One interface is a digital interface standard for image transmission development, as well as various wired or wireless interfaces, optical interfaces, etc.
  • the processor 1201 is configured to call the program instructions stored in the memory to execute the visual positioning method as in the above method embodiment.
  • the device further includes a memory 1203.
  • the processor 1202 may be a single-core processor or a multi-core processor group
  • the transmission interface 1202 is an interface for receiving or sending data
  • the data processed by the visual positioning device may include audio data, video data, or image data.
  • the visual positioning device may be a processor chip.
  • inventions of the embodiments of the present application further provide a computer storage medium.
  • the computer storage medium may include computer instructions.
  • the computer instructions run on an electronic device, the electronic device executes each of the steps performed by the server in the foregoing method embodiments. step.
  • inventions of the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute each step executed by the server in the foregoing method embodiment.
  • inventions of the embodiments of the present application also provide a device that has the function of realizing the server behavior in the foregoing method embodiment.
  • the functions can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, for example, an acquisition unit or module, and a determination unit or module.
  • the embodiment of the present application also provides a visual positioning device, which is used to execute the method steps executed by the terminal device or the processor of the terminal device in the above method embodiments.
  • the visual positioning device may include: a processing module 131 and a transceiver module 132.
  • the processing module 131 is configured to collect an image and display the image on the user interface.
  • the image includes the captured boundary line between the non-buildings, or at least one of the boundary lines between the building and the non-building item.
  • the processing module 131 is also used to send the image to the server through the transceiver module 132.
  • the transceiver module 132 is also used to receive virtual object description information sent by the server.
  • the virtual object description information is determined according to the positioning pose of the terminal device that collects the image, and the positioning pose is at least based on the two-dimensional
  • the line feature information and the location information of the terminal device are determined, the two-dimensional line feature information includes the information of the boundary between the building and the non-building, or the information of the boundary between the non-building and the non-building At least one item.
  • the processing module 131 is also used to superimpose and display the virtual object corresponding to the virtual object description information on the user interface.
  • the processing module 131 is also used to display prompt information on the user interface before the image is captured.
  • the prompt information is used to prompt the user to photograph the dividing line between the building and the non-building, or the non-building At least one of the dividing lines between objects and non-buildings.
  • the processing module 131 is further configured to determine whether the image is suitable for visual positioning through the end-side model before sending the image.
  • the visual positioning device provided in the embodiment of the present application can be used to execute the above-mentioned visual positioning method, and its content and effects can be referred to the method part, which will not be repeated in the embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a vision processing device according to an embodiment of the application.
  • the visual processing apparatus 1400 may be the terminal device involved in the foregoing embodiment.
  • the vision processing device 1400 includes a processor 1401 and a transceiver 1402.
  • the vision processing device 1400 further includes a memory 1403.
  • the processor 1401, the transceiver 1402, and the memory 1403 can communicate with each other through internal connection paths, and transfer control signals and/or data signals.
  • the memory 1403 is used to store computer programs.
  • the processor 1401 is configured to execute a computer program stored in the memory 1403, so as to implement various functions in the foregoing device embodiments.
  • the memory 1403 may also be integrated in the processor 1401 or independent of the processor 1401.
  • the vision processing device 1400 may further include an antenna 1404 for transmitting the signal output by the transceiver 1402.
  • the transceiver 1402 receives signals through an antenna.
  • the vision processing apparatus 1400 may further include a power supply 1405 for providing power to various devices or circuits in the terminal equipment.
  • the visual processing device 1400 may also include one of an input unit 1406, a display unit 1407 (also can be regarded as an output unit), an audio circuit 1408, a camera 1409, a sensor 1410, etc. Or more.
  • the audio circuit may also include a speaker 14081, a microphone 14082, etc., which will not be described in detail.
  • the embodiments of the present application also provide a computer storage medium.
  • the computer storage medium may include computer instructions.
  • the computer instructions run on an electronic device, the electronic device executes what is executed by the terminal device in the foregoing method embodiment. Various steps.
  • inventions of the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute each step performed by the terminal device in the foregoing method embodiment.
  • inventions of the embodiments of the present application also provide a device that has the function of realizing the behavior of the terminal device in the foregoing method embodiment.
  • the functions can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions, for example, an acquisition unit or module, a sending unit or module, and a display unit or module.
  • the processor mentioned in the above embodiments may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware encoding processor, or executed and completed by a combination of hardware and software modules in the encoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically accessible Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (personal computer, server, or network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Navigation (AREA)
  • Image Analysis (AREA)

Abstract

一种视觉定位方法和装置。该视觉定位方法可以包括:获取终端设备采集的图像;获取该图像的二维线特征信息,该二维线特征信息包括建筑物与非建筑物之间的分界线信息,或非建筑物与非建筑物之间的分界线信息中至少一项;根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿。该方法可以提升视觉定位的成功率和定位精度。

Description

视觉定位方法和装置
本申请要求于2020年5月31日提交中国专利局、申请号为202010481150.4、申请名称为“视觉定位方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种智能感知技术,尤其涉及一种视觉定位方法和装置。
背景技术
视觉定位是使用相机所拍摄的图像或者视频来进行定位,精确定位出相机在真实世界中的位置和姿态。视觉定位是近些年来计算机视觉领域的热点问题,其在增强现实、交互虚拟现实、机器人视觉导航、公共场景监控、智能交通等诸多领域都具有十分重要的意义。
视觉定位技术包括基于卫星地图的视觉定位方法(Geo-localization)。卫星地图(Satellite Map)通过卫星对场景进行白模重建得到的。基于卫星地图的视觉定位方法,使用该卫星地图(Satellite Map)对相机所拍摄的图像或者视频进行定位,获取相机坐标系在卫星地图中的6个自由度(Degree of freedom,DoF)位姿(Pose)。该类视觉定位技术可以应对大规模场景的视觉定位。
上述基于卫星地图的视觉定位方法,存在定位成功率较低和定位精度不高的问题。
发明内容
本申请提供一种视觉定位方法和装置,以提升定位成功率和定位精度。
第一方面,本申请实施例提供一种视觉定位方法,该方法可以包括:获取终端设备采集的图像。获取该图像的二维线特征信息,该二维线特征信息包括建筑物与非建筑物之间的分界线信息,或非建筑物与非建筑物之间的分界线信息中至少一项。根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿。
本实现方式,利用二维线特征信息进行视觉定位,该二维线特征信息可以包括建筑物与非建筑物之间的分界线信息或非建筑物与非建筑物之间的分界线信息中至少一项,可以解决视野内天际线较短或不够丰富的场景下的定位失败或定位精度不高的问题,提升视觉定位的成功率和精度,并且可以提升视觉定位的鲁棒性。
一种可能的设计中,获取该图像的二维线特征信息,可以包括:对该图像进行语义分割,提取该图像的二维线特征信息。
本实现方式,通过语义分割的方式提取该图像的二维线特征信息,以便基于该二维线特征信息进行视觉定位,可以提升视觉定位的成功率和精度。
一种可能的设计中,根据该图像对应的终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿,可以包括:根据该终端 设备的位置信息和磁力计角度偏转信息,确定候选位姿集合。根据该候选位姿集合、该二维线特征信息和该卫星地图,确定N个优化位姿。根据该N个优化位姿,确定该终端设备的定位位姿。其中,N为大于1的整数。
一种可能的设计中,该候选位姿集合包括M组候选位姿,每组候选位姿包括候选位置信息和候选偏航角度集合,该候选位置信息属于第一阈值范围内,该第一阈值范围为根据该终端设备的位置信息确定的,该候选偏航角度集合属于第二阈值范围内,该第二阈值范围为根据该终端设备的磁力计角度偏转信息确定的角度集合,M为大于1的整数。
一种可能的设计中,根据该候选位姿集合、该二维线特征信息和该卫星地图,确定N个优化位姿,包括:在该候选位姿集合中选取部分候选位姿,用该部分候选位姿和该卫星地图确定该部分候选位姿的全景线特征信息,并将该全景线特征信息与该二维线特征信息进行匹配,确定多个初始位姿,对多个初始位姿进行优化,确定该N个优化位姿。
本实现方式,通过在部分候选位姿中进行匹配和优化处理,可以降低定位时长,提升定位精度。
一种可能的设计中,根据该候选位姿集合、该二维线特征信息和该卫星地图,采用搜索方法和迭代方法,确定N个优化位姿,包括:步骤1:在该M组候选位姿中选取K 1组候选位姿,分别根据该K 1组候选位姿中的每组候选位姿的候选位置信息和该卫星地图,获取每组候选位姿的全景线特征信息。步骤2:分别对每组候选位姿的全景线特征信息与该二维线特征信息进行匹配,确定每组候选位姿的候选偏航角度信息,每组候选位姿的候选偏航角度信息为每组候选位姿的候选偏航角度集合中与该二维线特征信息匹配度最高的角度。步骤3:根据该K 1组候选位姿的候选偏航角度信息,得到K 1个初始位姿,每个初始位姿包括一组候选位姿的候选位置信息和候选偏航角度信息。步骤4:对该K 1个初始位姿采用迭代方法优化,得到K 1个优化位姿,并得到每个优化位姿对应的最近点损失。步骤5:根据每个优化位姿的最近点损失,在该K 1个优化位姿中确定一个优化位姿,作为该N个优化位姿中的一个优化位姿,该一个优化位姿为该K 1个优化位姿中最近点损失最小的优化位姿。步骤6:将K 1替换为K 1+n,重复执行步骤1至5,直至确定N个优化位姿,n取1至N-1,且K 1>K 2=K 3……=K N
一种可能的设计中,K 1+n组候选位姿的中心为对K n组候选位姿执行上述步骤1至5所确定出的一个优化位姿。
一种可能的设计中,每个初始位姿还包括预设的高度信息、俯仰角信息和翻滚角信息,每个优化位姿包括位置信息、高度信息、偏航角度信息、俯仰角信息和翻滚角信息。
一种可能的设计中,该匹配包括多模态鲁棒匹配或二维轮廓线匹配,其中,该多模态鲁棒匹配包括多重语义信息匹配或极大值抑制匹配。
本实现方式,通过多模态鲁棒匹配或二维轮廓线匹配,可以辅助提升定位效果。
一种可能的设计中,根据该N个优化位姿,确定该终端设备的定位位姿,包括:在该N个优化位姿中,选取损失最小的一个优化位姿作为该终端设备的定位位姿。其中,该损失为每个优化位姿的最近点损失和每个优化位姿对应的差值加权和,该差值为每个优化位姿的位置信息与该终端设备的位置信息之间的差值。
一种可能的设计中,该方法还可以包括:根据该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠。当该终端设备的定 位位姿可靠时,输出该终端设备的定位位姿。当该终端设备的定位位姿不可靠时,判定定位失败。其中,该热力图用于表示该部分候选位姿的分布。
本实现方式,通过该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠,可以提升定位结果的可信程度。
一种可能的设计中,根据该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠,包括:判断该终端设备的定位位姿是否满足以下条件至少之一:该终端设备的定位位姿对应的内点率大于第一阈值;或者,该终端设备的定位位姿对应的内点误差小于第二阈值;或者,该终端设备的定位位姿对应的热力图中部分候选位姿的分布的密集度大于第三阈值。
一种可能的设计中,该方法还包括:根据该终端设备的定位位姿确定虚拟物体描述信息。向该终端设备发送该虚拟物体描述信息,该虚拟物体描述信息用于在该终端设备上显示对应的虚拟物体。
第二方面,本申请实施例提供一种视觉定位方法,该方法可以包括:终端设备采集图像,并在该终端设备的用户界面上显示该图像,该图像包括拍摄到的非建筑物之间的分界线,或,建筑物和非建筑物之间的分界线中至少一项。向服务器发送该图像。接收该服务器发送的虚拟物体描述信息,该虚拟物体描述信息为根据采集该图像的终端设备的定位位姿确定的,该定位位姿为至少根据该图像的二维线特征信息和该终端设备的位置信息确定的,该二维线特征信息包括建筑物与非建筑物之间的分界线的信息,或非建筑物与非建筑物之间的分界线的信息中至少一项。在该用户界面上叠加显示该虚拟物体描述信息对应的虚拟物体。
一种可能的设计中,采集图像之前,该方法还包括:在该用户界面上显示提示信息,该提示信息用于提示用户拍摄建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项。
一种可能的设计中,采集图像之前,该方法还可以包括通过端侧模型判断该图像是否适合做视觉定位。
例如,将该图像输入至端侧模型中,通过该端侧模型对该图像进行语义分割,该端侧模型输出该图像的语义分割结果,根据该语义分割结果获取该图像的二维线特征信息,根据该二维线特征信息判断该图像是否适合做视觉定位。
例如,终端设备针对当前query图像,基于端侧语义分割模型,对query图像进行语义分割,并提取二维线特征,包括建筑物与非建筑物之间的分界线,以及各个不同非建筑物之间的分界线,判断该二维线特征的丰富程度。如果二维线特征比较丰富,即二维线特征的长度大于某一个阈值,则适合做视觉定。
例如,判断该二维线特征信息对应的建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项,是否丰富,若丰富,则确定该图像适合做视觉定位,若不丰富,则确定该图像不适合做视觉定位。
其中,丰富可以指,上述分界线的长度大于一个阈值。该分界线包括该二维线特征信息对应的建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项。
当确定该图像适合做视觉定位时,可以向服务器发送给图像,以便服务器基于该图像, 对终端设备进行视觉定位。
需要说明的是,本实现方式中的图像的二维线特征信息,与上述定位位姿确定所使用的图像的二维线特征信息的精度不同。上述定位位姿确定所使用的图像的二维线特征信息为服务器对该图像进行语义分割后所获取的,其精度高于本实现方式中的图像的二维线特征信息。
本实现方式,通过在终端设备对该图像进行预检测,将适合视觉定位的图像发送给服务器做进一步精确视觉定位,可以避免将不适合视觉定位的图像发送给服务器,造成传输资源和服务器侧计算资源的浪费。
第三方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为服务器或服务器的内部芯片,该视觉定位装置用于执行上述第一方面或第一方面的任一可能的实现方式中的视觉定位方法。具体地,该视觉定位装置以包括用于执行第一方面或第一方面的任一可能的实现方式中的视觉定位方法的模块或单元,例如,收发模块或单元,处理模块或单元。
第四方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为服务器或服务器的内部芯片,该视觉定位装置包括存储器和处理器,该存储器用于存储指令,该处理器用于执行存储器存储的指令,并且对存储器中存储的指令的执行使得处理器执行上述第一方面或第一方面的任一可能的实现方式中的视觉定位方法。
第五方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面或第一方面的任一可能的实现方式中的方法。
第六方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为终端设备或终端设备的内部芯片,该视觉定位装置用于执行上述第二方面或第二方面的任一可能的实现方式中的视觉定位方法。具体地,该视觉定位装置可以包括用于执行第二方面或第二方面的任一可能的实现方式中的视觉定位方法的模块或单元,例如,收发模块或单元,处理模块或单元。
第七方面,本申请实施例提供一种视觉定位装置,该通信装置可以作为终端设备或终端设备的内部芯片,该视觉定位装置包括存储器和处理器,该存储器用于存储指令,该处理器用于执行存储器存储的指令,并且对存储器中存储的指令的执行使得处理器执行第二方面或第二方面的任一可能的实现方式中的视觉定位方法。
第八方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第二方面或第二方面的任一可能的实现方式中的方法。
第九方面,本申请实施例提供一种计算机程序产品,计算机程序产品包括计算机程序,当计算机程序被计算机或处理器执行时,用于执行第一方面或第一方面的任一可能的实现方式中的方法,或者,用于执行第二方面或第二方面的任一可能的实现方式中的方法,或者,用于执行第三方面或第三方面的任一可能的实现方式中的方法。
本申请实施例的视觉定位方法和装置,服务器通过获取终端设备采集的图像的二维线特征信息,该二维线特征信息可以包括建筑物与非建筑物之间的分界线信息、或非建筑物与非建筑物之间的分界线信息中至少一项,根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿。利用二维线特征信息进行视觉定位,可以解决视野内天际线较短或不够丰富的场景下的定位失败或定位精 度不高的问题,提升视觉定位的成功率和精度,并且可以提升视觉定位的鲁棒性。
附图说明
图1为本申请实施例提供的一种卫星地图的示意图;
图2为本申请实施例提供的一种应用场景的示意图;
图3A为本申请实施例提供的终端设备的屏幕上显示的一种用户界面的示意图;
图3B为本申请实施例提供的终端设备的屏幕上显示的一种用户界面的示意图;
图3C为本申请实施例提供的终端设备的屏幕上显示的一种用户界面的示意图;
图4为本申请实施例提供的一种视觉定位方法的流程图;
图5为本申请实施例提供的一种图像的二维线特征信息的示意图;
图6为本申请实施例提供的一种视觉定位方法的流程图;
图7A为本申请实施例提供的一种热力图的示意图;
图7B为本申请实施例提供的一种确定终端设备的定位位姿是可靠或可信的示意图;
图8A为本申请实施例提供的一种鲁棒的基于卫星地图的视觉定位(Geo-localization)方法的流程图;
图8B为本申请实施例提供的一种维轮廓线匹配的示例说明的示意图;
图8C为本申请实施例提供的一种局部的维轮廓线匹配的示例说明的示意图;
图8D为本申请实施例提供的一种不同的匹配方式的定位结果的示意图;
图8E为本申请实施例提供的一种多重语义信息匹配的示意图;
图8F为本申请实施例提供通过ICP优化后得到的定位位姿与真实值的比对示意图;
图8G为本申请实施例提供的视觉定位方法的定位时长的示意图;
图8H为本申请实施例提供的视觉定位方法的定位准确率的示意图;
图9A为本申请实施例提供的一种视觉定位方法的处理过程的示意图;
图9B为本申请实施例提供的一种视觉定位方法的处理过程的示意图;
图10为本申请实施例提供的一种用户界面示意图;
图11为本申请实施例提供的一种视觉定位装置的结构示意图;
图12为本申请实施例提供的另一种视觉定位装置的结构示意图;
图13为本申请实施例提供的另一种视觉定位装置的结构示意图;
图14为本申请实施例提供的另一种视觉定位装置的结构示意图。
具体实施方式
本申请实施例所涉及的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以 是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
首先对本申请实施例中的部分用语进行解释说明,以便于理解本申请实施例的视觉定位方法。
视觉定位(Visual Localization):为了使真实世界与虚拟世界无缝融合,通过算法计算出终端设备的相机(camera)坐标系和真实世界三维坐标系的相对转换关系,进而实现真实世界里面的虚拟物体在终端设备中的显示。视觉定位技术要解决的问题是如何使用相机所拍摄的图像或者视频来进行定位,精确定位出相机在真实世界中的位置和姿态。
查询(query)图像:终端设备采集的用来实现视觉定位的RGB图像或者图片序列。
术语“图片(picture)”、“帧(frame)”或“图像(image)”可以用作同义词。
卫星地图(Satellite Map):通过卫星图片(如图1(a)所示)对场景进行白模重建(如图1(b)所示)而得到的地图。
基于卫星地图(Satellite Map)的视觉定位(Geo-localization):基于卫星地图(Satellite Map),定位出终端设备的相机(camera)坐标系在卫星地图中的位姿。
位姿:可以包括位置和姿态。其中,位置可以包括(x,y,z)坐标,姿态可以包括环绕三个坐标轴的角度偏转,环绕三个坐标轴的角度偏转分别为偏航(yaw)角,俯仰(pitch)角,翻滚(roll)角。包括(x,y,z)坐标,以及偏航(yaw)角,俯仰(pitch)角和翻滚(roll)角的位姿,也可以称为6个自由度(Degree of freedom,DoF)位姿(Pose)。
位姿(例如,候选位姿、定位位姿等)的全景线特征信息:基于卫星地图,提取位姿(例如,候选位姿、定位位姿等)对应的卫星地图全景线特征信息,该全景线特征信息可以包括该位姿(例如,候选位姿、定位位姿等)对应的包括建筑物与非建筑物之间的分界线信息、或非建筑物与非建筑物之间的分界线信息中至少一项。
本申请实施例涉及终端设备。终端设备可以是移动电话、智能手机、平板个人电脑(tablet personal computer)、媒体播放器、智能电视、笔记本电脑(laptop computer)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer)、智能手表、增强现实(augmented reality,AR)眼镜等可穿戴式设备(wearable device)、车载设备、或物联网(the Internet of things,IOT)设备等,本申请实施例对此不作限定。
图2为本申请实施例提供的一种应用场景的示意图,如图2所示,该应用场景可以包括终端设备11和服务器12,示例性的,终端设备11与服务器12可以进行通信,服务器12可以向终端设备提供视觉定位服务,以及基于视觉定位服务,向终端设备11推送虚拟物体描述信息,以使得终端设备可以呈现相应的虚拟物体,该虚拟物体可以是虚拟路标、虚拟人物等。本申请实施例提供一种视觉定位方法,以提升视觉定位的成功率和准确率,从而准确地向终端设备推送相应的虚拟物体描述信息,其具体解释说明可以参见下述实施例。
本申请实施例的视觉定位方法可以应用于AR导航、AR人机交互、辅助驾驶、自动驾驶等需要定位终端设备的相机的位置和姿态的领域。例如,超大场景视觉导航系统,视觉导航指的是通过增强现实等交互方式将用户引导至某一个目的地点。用户可实时在终端设 备的屏幕上看到建议的步行方向、离目的地的距离等信息,如图3A所示,虚拟物体为屏幕上显示的J2-1-1B16会议室的步行方向,即通过增强现实向用户展示步行方向等。再例如,超大场景AR游戏交互,如图3B和3C所示,AR游戏交互可以将AR内容固定在特定的地理位置,用户所使用的终端设备可以通过本申请实施例的视觉定位方法,在屏幕上显示相应的虚拟物体(例如,图3B所示的虚拟人物,图3C所示的虚拟动画),用户通过点击/滑动终端设备的屏幕等方式实现和虚拟物体的互动,可以引导虚拟物体和真实世界发生交互。
需要说明的是,终端设备11通常设置有摄像头,终端设备11可以通过摄像头对场景进行拍摄。上述服务器12以一个服务器为例进行举例说明,本申请不以此作为限制,例如,其也可以是包括多个服务器的服务器集群。
图4为本申请实施例提供的一种视觉定位方法的流程图,本实施例的方法涉及终端设备和服务器,如图4所示,本实施例的方法可以包括:
步骤101、终端设备采集图像。
终端设备通过摄像头采集图像,该图像可以是如上所述的查询(query)图像。
以终端设备是智能手机为例,智能手机可以根据应用程序的触发,启动拍摄功能,采集该图像。例如,可以周期性采集图像,例如,2秒,30秒等,也可以是满足预设采集条件时,采集图像,该预设采集条件可以是智能手机的GPS数据在预设范围内。终端设备采集的一个或多个图像均可以通过如下步骤,以实现视觉定位。
终端设备在采集图像时,还可以采集该终端设备的位置信息和磁力计角度偏转信息。该终端设备的位置信息和磁力计角度偏转信息的具体解释说明可以参见步骤104的相关解释说明。
步骤102、终端设备向服务器发送图像。
服务器接收终端设备发送的图像。
在一些实施例中,终端设备在发送该图像时,还可以向服务器发送该终端设备的位置信息和磁力计角度偏转信息。在一些实施例中,终端设备可以在发送图像之后,向与该图像对应的该终端设备的位置信息和磁力计角度偏转信息。
步骤103、服务器根据该图像获取该图像的二维线特征信息。
该二维线特征信息可以包括建筑物与非建筑物之间的分界线信息、或非建筑物与非建筑物之间的分界线信息中至少一项。该建筑物可以包括住宅、写字楼、体育馆、展览馆或医院等,该非建筑可以包括植被、天空、水面(例如,湖面、河面、或海面等)、或地面等。举例而言,该建筑物与非建筑物之间的分界线信息可以为建筑物与树木之间的分界线信息,或建筑物与地面之间的分界线信息(也称为建筑物的下边沿信息),或建筑物与天空之间的分界线信息(也称为建筑物的上边沿信息)等,该不同非建筑物之间的分界线信息可以为道路与河面、道路与植被、道路与人行道、或不同道路之间的分界线信息。非建筑物与道路信息之间的分界线信息也可以称为道路的边界线信息。如图5所示,例如,该图像为如图5中左侧所示的图像,该图像中包括建筑物、道路、植被和天空,则获取的该图像的二维线特征信息可以是如图5中右侧所示的建筑物与植被之间的分界线、植被与道路之间的分界线、建筑物与道路之间的分界线、建筑物与天空之间的分界线、植被与天空之间的分界线等信息。终端设备在不同的场景采集到的图像,图像的内容可以包括建筑物和/或非建筑物 等,针对不同成像内容的图像,服务器获取的图像的二维线特征信息不同。
上述建筑物与非建筑物之间的分界线信息、非建筑物与非建筑物之间的分界线信息可以称为不同类别的二维线特征信息,在一些实施例中,服务器可以根据终端设备的位置信息,或者根据终端设备的位置信息和磁力计角度偏转信息,确定图像的二维线特征信息的类别,进而获取图像的相应类别的二维线特征信息。例如,用户使用终端设备在市区街道内采集了一个图像,并发送给服务器,服务器可以根据该终端设备的位置信息,确定图像的二维线特征信息的类别,该二维线特征信息的类别包括建筑物与非建筑物之间的分界线信息和非建筑物与非建筑物之间的分界线信息,进而获取该图像的建筑物与非建筑物之间的分界线信息和非建筑物与非建筑物之间的分界线信息。再例如,用户使用终端设备在外滩江边采集了一个图像,并发送给服务器,服务器可以根据该终端设备的位置信息和磁力计角度偏转信息,确定图像的二维线特征信息的类别,该二维线特征信息的类别包括建筑物与非建筑物之间的分界线信息和非建筑物与非建筑物之间的分界线信息,进而获取该图像的建筑物与非建筑物之间的分界线信息和非建筑物与非建筑物之间的分界线信息。
在一些实施例中,服务器可以对图像进行语义分割,提取该图像的二维线特征信息。例如,对图像进行不同类别(例如,植被、建筑物、天空、水面、地面等)的语义分割,提取该图像的二维线特征信息。
上述语义分割的具体实施方式可以有很多种方式,例如,通过语义分割模型实现语义分割,输出该图像的二维线特征信息。举例而言,该语义分割模型可以是任意神经网络模型,例如,深度神经网络(Deep Neural Network,DNN)、卷积神经网络(Convolutional Neural Networks,CNN)或其组合等。该语义分割模型也可以是任意机器学习的分类器,例如,支持向量机(support vector machine,SVM)分类器。
该语义分割模型可以对输入的图像进行语义分割,区分出建筑物的轮廓线、天空、植被、地面、或水面等,进而输出该图像的二维线特征信息。该语义分割可以是密集的像素级别的分类任务。该语义分割模型可以是使用训练图像和标签值(用于表示像素点对应的类别,例如,建筑物、天空等)进行训练得到的。示例性的,在训练时采用的训练策略可以是标准的交叉熵损失,用以衡量语义分割模型的预测值与标签值之间的差距,通过最小化该交叉熵损失,提高语义分割模型的预测效果。最终训练得到的语义分割模型,能够区分图像中的建筑物与非建筑物之间的分界线、和/或非建筑物与非建筑物之间的分界线等。
步骤104、服务器根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿。
其中,该位置信息可以是终端设备的卫星定位信息,例如,可以是终端设备的全球定位系统(Global Positioning System,GPS)信息、终端设备的北斗卫星导航系统(BeiDou Navigation Satellite System,BDS)信息、终端设备的格洛纳斯(GLONASS)信息或终端设备的伽利略卫星导航系统(Galileo satellite navigation system)信息。该磁力计角度偏转信息可以是偏航(yaw)角。该位置信息和磁力计角度偏转信息可以是终端设备采集该图像时的位置信息和磁力计角度偏转信息,其可以通过终端设备的无线通信模块和磁力计获取。
服务器可以利用该终端设备的位置信息和磁力计角度偏转信息,确定多个候选位姿,基于该卫星地图,提取各个候选位姿的全景线特征信息,根据各个候选位姿的全景线特征信息和该二维线特征信息,确定该终端设备的定位位姿,从而可以结合图像的二维线特征 信息,确定终端设备的定位位姿,提升定位成功率和定位精度。
示例性的,服务器可以根据终端设备的位置信息和磁力计角度偏转信息,确定候选位姿集合。该候选位姿集合可以包括多组候选位姿,每组候选位姿包括候选位置信息和候选偏航角度集合,该候选位置信息为根据该终端设备的位置信息确定的,例如,服务器可以根据该终端设备的位置信息确定候选位置范围,候选位置信息属于该候选位置范围,该候选位置范围可以为以该终端设备的位置信息为圆心,一定半径(例如,30米)的圆形区域范围。再例如,服务器可以根据该终端设备的磁力计角度偏转信息确定候选偏航角度集合,示例性的,该候选偏航角度集合可以该终端设备的磁力计角度偏转信息的正负90度范围内的偏航角度集合。服务器可以根据候选位姿集合、该二维线特征信息和卫星地图,确定N个优化位姿。根据该N个优化位姿,确定该终端设备的定位位姿。其中,N为大于1的整数。
一种可实现方式,服务器可以根据该候选位姿集合、该二维线特征信息和卫星地图,采用搜索方法和迭代方法,确定N个优化位姿。其中,该搜索方法用于在该候选位姿集合中选取部分候选位姿,用该部分候选位姿和卫星地图确定该部分候选位姿的全景线特征信息,并将该全景线特征信息与该二维线特征信息进行匹配,确定多个初始位姿,该迭代方法用于对该多个初始位姿进行优化,确定所述N个优化位姿。
该搜索方法可以选取部分候选位姿的全景线特征信息与该二维线特征信息进行匹配,从而可以减少确定终端设备的定位位姿所需时间,即减少定位时间。例如,该搜索方法可以在该候选位姿集合中进行多次搜索以确定该N个优化位姿。示例性的,进行N次搜索以确定该N个优化位姿。
以N次搜索以确定该N个优化位姿为例,服务器可以在第一次搜索时,在该候选位姿集合中选取部分候选位姿的全景线特征信息与二维线特征信息进行匹配,并使用该迭代方法确定一个优化位姿,服务器在第二次搜索时,在该候选位姿集合中选取第一次搜索确定的优化位姿附近的部分候选位姿的全景线特征信息与二维线特征信息进行匹配,并使用该迭代方法确定一个优化位姿,以此类推,重复执行搜索方法和迭代方法,直至确定N个优化位姿。
该迭代方法可以为迭代最近点算法(Iterative Closest Points Algorithm,ICP)。该迭代方法可以对匹配得到的初始位姿进行优化,得到优化位姿,从而可以提升最终确定的定位位姿的精度。
可选的,服务器可以在该N个优化位姿中,选取损失最小的一个优化位姿作为该终端设备的定位位姿。其中,该损失包括每个优化位姿的位置信息与终端设备的位置信息的差值,和每个优化位姿的最近点损失(ICP loss)。
示例性的,以一个优化位姿的损失为例,该优化位姿的位置信息与终端设备的位置信息的差值,和优化位姿的最近点损失可以采用加权求和的方式,得到该优化位姿的损失。例如,该优化位姿的损失等于,a1*该优化位姿的位置信息与终端设备的位置信息的差值+a2*优化位姿的最近点损失。a1和a2的具体取值可以根据需求进行灵活设置。
其中,该优化位姿对应的最近点损失为,基于该卫星地图,提取的该优化位姿的全景线特征信息,与该二维线特征信息进行匹配,所得到的最近点损失。
步骤105a、服务器根据终端设备的定位位姿确定虚拟物体描述信息。
例如,服务器可以根据定位位姿确定虚拟物体描述信息,该虚拟物体描述信息用于在 终端设备上显示相应的虚拟物体,例如,如图3A所示的步行引导图标,该引导图标显示在真实世界的实际场景中,即显示在如图3A所示的街道上。
步骤105b、服务器向终端设备发送该虚拟物体描述信息。
步骤106、终端设备在用户界面上显示该虚拟物体描述信息对应的虚拟物体。
终端设备在用户界面上显示该虚拟物体描述信息对应的虚拟物体,该用户界面中显示有真实世界的实际场景,该虚拟物体可以采用增强现实的方式显示在该用户界面上。
本实施例,服务器通过获取终端设备采集的图像的二维线特征信息,该二维线特征信息可以包括建筑物与非建筑物之间的分界线信息或非建筑物与非建筑物之间的分界线信息中至少一项,根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿。利用二维线特征信息进行视觉定位,可以解决视野内天际线较短或不够丰富的场景下的定位失败或定位精度不高的问题,提升视觉定位的成功率和精度,并且可以提升视觉定位的鲁棒性。
图6为本申请实施例提供的一种视觉定位方法的流程图,本实施例的方法涉及终端设备和服务器,本实施例在图4所述实施例的基础上,在确定终端设备的定位位姿之后,进一步确定该定位位姿是否可靠,从而提升定位结果的可信程度,如图6所示,本实施例的方法可以包括:
步骤201、终端设备采集图像。
步骤202、终端设备向服务器发送图像。
步骤203、服务器根据该图像获取该图像的二维线特征信息。
其中,步骤201至步骤203的解释说明可以参见图4所示实施例的步骤101至步骤103,此处不再赘述。
步骤2041、服务器根据该终端设备的位置信息和磁力计角度偏转信息,确定候选位姿集合。
步骤2042、服务器根据该候选位姿集合、该二维线特征信息和该卫星地图,采用搜索方法和迭代方法,确定N个优化位姿。
其中,该搜索方法用于在该候选位姿集合中选取部分候选位姿与该二维线特征信息进行匹配,确定多个初始位姿,该迭代方法用于对该多个初始位姿进行优化,确定该N个优化位姿。
步骤2043、服务器根据该N个优化位姿,确定该终端设备的定位位姿。
其中,步骤2041至步骤2042的解释说明可以参见图4所示实施例的步骤104,此处不再赘述。
步骤205、服务器根据该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠,若可靠,则执行步骤206a,若不可靠,则执行步骤208。
其中,该热力图用于表示N个优化位姿的位置信息在候选位置集合中的分布。该内点率和该内点误差用于描述基于卫星定位,终端设备的定位位姿的全景线特征信息,与该二维线特征信息的匹配程度。其中,该内点指,该二维线特征信息,与该终端设备的定位位姿的全景线特征信息的差值小于L1的点,该L1可以取小于10、或5、或4的任意正整 数。该内点率指,该差值小于L1的点的总个数在该二维线特征信息所有点总个数中的占比。该内点误差指,该差值小于L1的点的差值均值。
举例而言,本申请实施例提供一种热力图的示意图,参见图7A所示,该热力图的中心为该终端设备的位置信息所在点,图中每一个实心方点代表采用上述搜索方法和迭代方法处理过的点。
在一些实施例中,服务器可以判断该终端设备的定位位姿是否满足以下条件至少之一:该终端设备的定位位姿对应的内点率大于第一阈值;或者,该终端设备的定位位姿对应的内点误差小于第二阈值;或者,该终端设备的定位位姿对应的热力图中该候选位姿的分布的密集度大于第三阈值。该第一阈值、第二阈值和第三阈值的取值可以是任意正数,其可以根据需求进行灵活设置。
该终端设备的定位位姿对应的内点率大于第一阈值;或者,该终端设备的定位位姿对应的内点误差小于第二阈值,均可以表示该终端设备的定位位姿,与该二维线特征信息较为相似,且匹配。终端设备的定位位姿对应的热力图中该候选位姿的分布的密集度大于第三阈值,可以表示搜索方法在候选位姿集合中选取的候选位姿较为集中,使得多次搜索最终确定的定位位姿较为准确。以图7A所示热力图做进一步举例说明,由图7A可见,该热力图中候选位姿的分布较为集中,则可以确定该定位位姿可靠。
换言之,终端设备的定位位姿的内点率高,和/或内点误差小,则该终端设备的定位位姿是可靠或可信的,反之,终端设备的定位位姿的内点率低,和/或内点误差大,则该终端设备的定位位姿是不可靠或不可信的。终端设备的定位位姿对应的热力图中该候选位姿的分布集中,则则该终端设备的定位位姿是可靠或可信的,反之,终端设备的定位位姿对应的热力图中该候选位姿的分布分散,则该终端设备的定位位姿是不可靠或不可信的。
以服务器通过内点率、内点误差和热力图确定该终端设备的定位位姿是可靠或可信为例,参见图7B所示,图7B中左侧所示图像为终端设备采集的图像,中间所示图像为终端设备的定位位姿的全景线特征信息与图像的二维线特征信息,即各个分界线,以建筑物与天空之间的分界线为例,如图7B中间所示的基于定位卫星的建筑物与天空之间的分界线,以及基于图像的建筑物与天空之间的分界线,基于此可以计算内点率和内点误差,再结合图7B中右侧所示的热力图,最终确定该终端设备的定位位姿是可靠或可信。
步骤206a、服务器根据终端设备的定位位姿确定虚拟物体描述信息。
执行步骤206a之后可以执行步骤206b和步骤207。
步骤206b、服务器向终端设备发送该虚拟物体描述信息。
步骤207、终端设备在用户界面上显示该虚拟物体描述信息对应的虚拟物体。
步骤208、服务器向终端设备发送提示消息,该提示消息用于指示定位失败。
该提示消息还用于指示重新采集图像。
步骤209、终端设备在用户界面上显示定位失败。
终端设备还可以在用户界面上显示提示用户重新采集图像的信息。
本实施例,服务器通过获取终端设备采集的图像的二维线特征信息,该二维线特征信息可以包括建筑物与非建筑物之间的分界线信息或非建筑物与非建筑物之间的分界线信息中至少一项,根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿,通过该终端设备的定位位姿对应的内点率、 内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠,在可靠时,输出该终端设备的定位位姿,在不可靠时,确定定位失败。利用二维线特征信息进行视觉定位,可以解决视野内天际线较短或不够丰富的场景下的定位失败或定位精度不高的问题,提升视觉定位的成功率和精度,并且可以提升视觉定位的鲁棒性。进一步,通过该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠,可以提升定位结果的可信程度。
下面采用图8A所示实施例对上述步骤104的一种具体的可实现方式进行解释说明。
图8A为本申请实施例提供的一种鲁棒的基于卫星地图的视觉定位(Geo-localization)方法的流程图,本实施例的执行主体可以是服务器或服务器的内部芯片,如图8A所示,本实施例的方法可以包括:
步骤301、根据图像对应的终端设备的位置信息和磁力计角度偏转信息,确定M组候选位姿集合。
每组候选位姿包括候选位置信息和候选偏航角度集合,该候选位置信息属于第一阈值范围内,该第一阈值范围为根据该终端设备的位置信息确定的,该候选偏航角度集合属于第二阈值范围内,该第二阈值范围为根据该终端设备的磁力计角度偏转信息确定的角度集合。M取大于1的正整数。
示例性的,终端设备可以根据图像对应的终端设备的位置信息和磁力计角度偏转信息,分别构建候选位置集合(T)和候选偏航(yaw)角集合(Y),候选位置集合(T)包括多个候选位置信息,候选偏航(yaw)角集合(Y)包括多个偏航(yaw)角,T中的一个候选位置信息和候选偏航(yaw)角集合(Y)可以组成一组候选位姿,从而可以组成多组候选位姿。
构建候选位置集合(T)的一种可实现方式为,在一个区域范围内,以第一预设间隔为间隔,选取位置点作为候选位置集合(T)中的候选位置信息,该区域范围可以是以图像对应的终端设备的位置信息(x,y)为圆心,半径为第四阈值的范围。即上述第一阈值范围的中心值为终端设备的位置信息。例如,该第四阈值可以是30米、35米等。该第一预设间隔可以是1米。
构建候选偏航(yaw)角集合(Y)的一种可实现方式为,在一个角度范围内,以第二预设间隔为间隔,选取角度作为候选偏航(yaw)角集合(Y)中的偏航(yaw)角,该角度范围可以是以图像对应的终端设备的偏航(yaw)角的正负第五阈值的范围。即上述第二阈值范围的中心值为终端设备的磁力计角度偏转信息。例如,该第五阈值可以是90度、85度等。该第二预设间隔可以是0.1度。
上述构建候选位置集合(T)和候选偏航(yaw)角集合(Y)的可实现方式为一种举例说明,本申请实施例不以此作为限制。
步骤302、在M组候选位姿中选取K 1组候选位姿,分别根据该K 1组候选位姿中的每组候选位姿的候选位置信息和卫星地图,获取每组候选位姿的全景线特征信息。
与对所有候选位姿信息进行匹配不同,为了降低匹配所消耗时长,本申请实施例可以在M组候选位姿中选取K 1组候选位姿进行匹配。该K 1组候选位姿的选取方式,可以为在M组候选位姿中,基于候选位置信息,间隔选取K 1组候选位姿。例如,K 1组候选位姿中相邻 两个候选位置的候选位置信息间隔3米。
步骤303、分别对每组候选位姿的全景线特征信息与二维线特征信息进行匹配,确定每组候选位姿的候选偏航角度信息。
该每组候选位姿的候选偏航角度信息为每组候选位姿的候选偏航角度集合中与该二维线特征信息匹配度最高的角度。
通过对每组候选位姿的全景线特征信息与二维线特征信息进行匹配,确定该组候选位姿的候选偏航角度信息,即确定一个偏航(yaw)角。
在匹配过程中,可以使用滑窗遍历匹配候选位姿的全景线特征信息与二维线特征信息。该匹配可以包括多模态鲁棒匹配或二维轮廓线匹配,其中,该多模态鲁棒匹配包括多重语义信息匹配或极大值抑制匹配。
其中,二维轮廓线匹配的示例说明可以参见图8B所示,其中,图8B中的(a)为一个图像的匹配过程示意,即将全景线特征信息(图中较浅较长的线)与图像的二维线特征信息(图中较深较短的线)进行匹配,图8B中的(b)为另一个图像的匹配过程示意,其匹配原理相同,此处不再赘述。
对局部的二维轮廓线匹配的示例说明可以参见图8C所示,其中,图8C中左侧的匹配示意为现有技术中的匹配方式,即采用最近点距离为垂直距离的方式进行匹配,该匹配方式存在较大误差,本申请实施例的匹配方式可以如图8C中右侧的匹配示意,即采用最近点距离为水平距离的方式进行匹配,这样的匹配方式,可以使得视觉定位更加准确。
结合图8C对两种不同的匹配方式的定位结果进行示意性说明,请参照图8D,图8D为本申请实施例的图8C的匹配方式对应的定位结果。其中,图8D中第一行为采用图8C中左侧的匹配方式对图8D中的左侧的原图进行处理后的定位结果,图8D中第二行为采用图8C中右侧的匹配方式对图8D中的左侧的原图进行处理后的定位结果。由图8D可见,本申请实施例的匹配方式的定位结果对应的边界线,与真实值(ground truth)对应的边界线,更为接近。
本申请实施例结合建筑物与非建筑物之间的分界线信息、非建筑物与非建筑物之间的分界线信息进行匹配,可以有效提升定位区分度。
对多重语义信息匹配进行解释说明,以二维线特征信息为树木与天空之间的分界线信息为例,对其基本原理进行解释说明:1)图像中树木与天空的分界线一定比地图编码中建筑物与天空的分界线高;2)图像中如果建筑物的上边界超过了图像上边界,则说明其一定比地图编码中建筑物和天空的分界线低;优化方法为:如果在匹配的过程中,某一个候选位姿违反了上述规则,则认为该候选位姿不合理。
对于多重语义信息匹配的示意可以参见图8E,如图8E所示,其中图像中树木与天空的分界线一定比地图编码中建筑物与天空的分界线高,图像中建筑物的上边界超过了图像上边界,则说明其一定比地图编码中建筑物和天空的分界线低。
对极大值抑制匹配进行解释说明,由于白模、语义分割等存在误差,在匹配的过程中,可能会遇到局部存在较大误差的情况(尤其在建筑物边缘部分),如果不加以抑制,有可能会对匹配结果造成不良的影响。优化方法为:如果二维线特征信息误差超过某一个阈值,则将其抑制为该阈值。
步骤304、根据该K 1组候选位姿的候选偏航角度信息,得到K 1个初始位姿,每个初始位姿包括一组候选位姿的候选位置信息和候选偏航角度信息。
以一个组为例,将通过上述步骤303确定的一个组候选位姿的候选偏航角度信息,和该组候选位姿的候选位姿信息,组成一个初始位姿。有K 1组候选位姿,则可以通过匹配处理得到K 1个初始位姿。
步骤305、对该K 1个初始位姿采用迭代方法优化,得到K 1个优化位姿,并得到每个优化位姿对应的最近点损失。
该迭代方法优化可以是如上所述的ICP。即对每个初始位姿采用ICP优化,得到一个优化位姿。例如,采用ICP优化初始位姿的偏航(yaw)角。
在一些实施例中,每个初始位姿还可以包括预设的高度信息、俯仰角信息(俯仰(pitch))和翻滚角信息(翻滚(roll))。例如,该预设的高度信息可以为1.5m等。该俯仰角信息(俯仰(pitch))和翻滚角信息(翻滚(roll))可以由同时定位与建图(Simultaneous Localization and Mapping,SLAM)算法给出。由于SLAM给出的俯仰角信息(俯仰(pitch))和翻滚角信息(翻滚(roll))会存在一些误差,采用ICP可以进一步优化俯仰角信息(俯仰(pitch))和翻滚角信息(翻滚(roll))
每个优化位姿包括位置信息、高度信息、磁力计角度信息(优化后的偏航(yaw)角)、俯仰角信息(优化后的俯仰(pitch)))和翻滚角信息(优化后的翻滚(roll))。
优化方法为:将图像(也称为query图像)的二维线特征和编码库中的线特征上的点提取出来,映射到单位球上,将其视为两组点云;然后利用ICP对点云进行匹配。ICP的输出为pitch、yaw和roll三个角度;将这三个角度作为最终输出的角度(也即上述优化后的pitch角、优化后的yaw角和优化后的roll角),而不再采用SLAM给出的角度。
参见图8F所示,其中,图8F中的(a)为原图,即上述实施例中的图像,图8F中的(b)为不采用ICP优化得到的定位位姿对应的边界线,与真实值(ground truth)对应的边界线,由图8F中的(b)的天空与建筑物之间的分界线可见,二者相差较大,图8F中的(c)为本申请实施例的采用ICP优化得到的定位位姿对应的边界线,与真实值(ground truth)对应的边界线,由图8F中的(c)的天空与建筑物之间的分界线可见,二者相差较小。
步骤306、根据每个优化位姿的最近点损失,在该K 1个优化位姿中确定一个优化位姿,作为该N个优化位姿中的一个优化位姿,该一个优化位姿为该K 1个优化位姿中最近点损失最小的优化位姿。
该最近点损失可以是如上述图4所示实施例的损失中的优化位姿对应的匹配度。
步骤307、判断是否确定N个优化位姿,若否,则将K 1替换为K 1+n,重复执行步骤302至307,若是,则执行上述步骤308。
重复执行步骤302至307直至确定N个优化位姿,n取1至N-1,且K 1>K 2=K 3……=K N
在一些实施例中,K 1+n组候选位姿的中心为对K n组候选位姿执行上述步骤302至307所确定出的一个优化位姿。即一次搜索和优化后的优化位姿可以用于确定下一次搜索和优化的多组候选位姿。例如,选取一次搜索和优化后的优化位姿周围的候选位姿进行下一次搜索和优化。
步骤308、根据该N个优化位姿,确定该终端设备的定位位姿。
例如,在该N个优化位姿中选取损失最小的一个优化位姿,作为该终端设备的定位位姿。该损失包括优化位姿的位置信息与终端设备的位置信息的差值,和优化位姿对应的匹配度。
本实施例,在视觉定位过程中,可以利用二维线特征信息,可以解决视野内天际线较短或不够丰富的场景下的定位失败或定位精度不高的问题,提升视觉定位的成功率和精度,并且可以提升视觉定位的鲁棒性。通过搜索方法和迭代最近点方法可以降低定位时长,提升定位精度。
对本申请实施例的视觉定位方法的效果进行说明,具体可以参见图8G和图8H,图8G示出了本申请实施例的视觉定位时长与现有技术的定位时长,如图8G所示,本申请实施例的视觉定位方法可以降低定位时长。图8H示出了本申请实施例的视觉定位精度与现有技术的定位精度,如图8F所示,本申请实施例的视觉定位方法,对于不同的定位误差,例如,1米1度(1m1 °),2米2度(2m2 °)等,定位准确率均高于现有技术的定位准确率。
图9A为本申请实施例提供的一种视觉定位方法的处理过程的示意图,如图9A所示,本实施例的方法可以包括:终端设备采集图像和该终端设备的位置信息和磁力计角度偏转信息(S501)。服务器获取图像和该终端设备的位置信息和磁力计角度偏转信息。服务器对该图像进行语义分割(S502),基于语义分割的结果提取该图像的二维线特征信息(S503)。服务器基于该终端设备的位置信息和磁力计角度偏转信息,确定M组候选位姿集合(S504)。服务器在该M组候选位姿集合中通过搜索方法选取部分候选位姿执行候选处理步骤(S505)。服务器根据部分候选位姿,从卫星地图中提取每组候选位姿的全景线特征信息(S506)。分别对每组候选位姿的全景线特征信息与二维线特征信息进行匹配,确定每组候选位姿的候选偏航角度信息,得到多个初始位姿(S507)。对该多个初始位姿采用迭代方法优化,得到多个优化位姿(S508)。在该多个优化位姿中确定一个优化位姿,作为该N个优化位姿中的一个优化位姿。重复执行(S505)至(S508),确定N个优化位姿。在该N个优化位姿中选取损失最小的一个优化位姿,作为该终端设备的定位位姿(S509)。服务器进行置信度判定(S510),即根据该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠,当可以可靠时输出该定位位姿。
其中,上述各个步骤的具体解释说明可以参见上述实施例中相关步骤的解释说明,其具体实施方式和技术效果,此处不再赘述。
可选的,该视觉定位方法还可以包括,终端设备进行预检测处理(S511)。其中,图9A中的虚线表示可选的。
该预检测处理的一种可实现方式为:在发送图像之前,通过端侧模型判断该图像是否适合做视觉定位。
比如,终端设备针对当前query图像,基于端侧语义分割模型,对query图像进行语义分割,并提取二维线特征,包括建筑物与非建筑物之间的分界线,以及各个不同非建筑物之间的分界线,判断该二维线特征的丰富程度。如果二维线特征比较丰富,即二维线特征的长度大于某一个阈值,则适合做视觉定位。
终端设备的处理过程可以参见图9B,图9B为本申请实施例提供的一种视觉定位方法的处理过程的示意图,本实施例的执行主体可以为终端设备或终端设备的处理器,本实施例可以包括:
步骤601、终端设备采集图像、该终端设备的位置信息和磁力计角度偏转信息。
步骤602、终端设备通过端侧模型判断该图像是否适合做视觉定位,若是,则执行步骤S603,若否,则执行步骤601。
步骤603、终端设备向服务器发送该图像、该终端设备的位置信息和磁力计角度偏转信息。
例如,将该图像输入至端侧模型中,通过该端侧模型对该图像进行语义分割,该端侧模型输出该图像的语义分割结果,根据该语义分割结果获取该图像的二维线特征信息,根据该二维线特征信息判断该图像是否适合做视觉定位。
例如,判断该二维线特征信息对应的建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项,是否丰富,若丰富,则确定该图像适合做视觉定位,若不丰富,则确定该图像不适合做视觉定位。
其中,丰富可以指,上述分界线的长度大于一个阈值。该分界线包括该二维线特征信息对应的建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项。
当确定该图像适合做视觉定位时,可以向服务器发送给图像,以便服务器基于该图像,对终端设备进行视觉定位。
与上述实施例中的语音分割模型类似,本实施例的端侧模型用于实现语义分割,输出该图像的二维线特征信息。举例而言,该端侧模型可以是任意神经网络模型,例如,深度神经网络(Deep Neural Network,DNN)、卷积神经网络(Convolutional Neural Networks,CNN)或其组合等。该语义分割模型也可以是任意机器学习的分类器,例如,支持向量机(support vector machine,SVM)分类器。
需要说明的是,预检测过程中的图像的二维线特征信息,与上述定位位姿确定所使用的图像的二维线特征信息的精度不同。上述定位位姿确定所使用的图像的二维线特征信息为服务器对该图像进行语义分割后所获取的,其精度高于预检测过程中的图像的二维线特征信息。
服务器的语义分割模型,针对query图像进行不同类别(植被、建筑物、天空等)的精细语义分割。这里的语义分割模型要比终端设备中预检测处理所使用的端侧模型大,分割精度比端侧模型精度高。
需要说明的是,在终端设备向服务器发送该图像后,服务器可以通过上述实施例的步骤确定终端设备的定位位姿,并向终端设备返回虚拟物体描述信息,以便在终端设备的用户界面上显示相应的虚拟物体。
本实现方式,通过在终端设备对该图像进行预检测,将适合视觉定位的图像发送给服务器做进一步精确视觉定位,可以避免将不适合视觉定位的图像发送给服务器,造成传输资源和服务器侧计算资源的浪费。
下面结合图10,通过具体示例,对上述实施例的视觉定位方法进行说明。
图10为本申请实施例提供的一种用户界面示意图。如图10所示,包括用户界面901-用户界面902。
如用户界面901所示,终端设备可以采集图像,该图像呈现在用户界面901中。
该用户界面901可以是一个应用程序的用户界面,举例而言,该应用程序可以是用于提供AR导航服务的应用程序,用户可以点击该应用程序的图标,响应于该点击操作,终端设备可以显示该用户界面901,在用户界面901中显示该图像。
可选的,在用户界面901中还可以显示提示信息(如图10所示的文本框9011),该提示信息用于提示用户拍摄建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项,例如,该提示信息可以是“请尽量拍摄丰富场景:植被与建筑物之间的分界线、道路与建筑物之间的分界线等”。
用户界面901中的图像包括建筑物与植被之间的分界线、植被与道路之间的分界线、建筑物与道路之间的分界线、建筑物与天空之间的分界线、以及植被与天空之间的分界线,所以可以满足视觉定位需求。终端设备可以通过上述步骤102将图像发送给服务器。服务器可以通过上述步骤103至104,确定该终端设备的定位位姿,通过步骤105向终端设备发送该定位位姿对应的虚拟物体描述信息。终端设备根据该虚拟物体描述信息可以显示用户界面902,用户界面902中呈现了虚拟物体描述信息对应的虚拟物体,例如,咖啡馆的引导图标。
本实施例,在视觉定位过程中,可以利用二维线特征信息,可以解决视野内天际线较短或不够丰富的场景下的定位失败或定位精度不高的问题,提升视觉定位的成功率和精度,并且可以提升视觉定位的鲁棒性。通过搜索方法和迭代最近点方法可以降低定位时长,提升定位精度。并且基于定位位姿,向终端设备推送虚拟物体描述信息,以使得终端设备在用户界面上呈现该虚拟物体描述信息对应的虚拟物体,从而使得本申请实施例的视觉定位方法可以应用于AR导航、AR人机交互、辅助驾驶、自动驾驶等需要定位终端设备的相机的位置和姿态的领域,提升用户使用体验。
本申请实施例还提供一种视觉定位装置,用于执行以上各方法实施例中服务器或服务器的处理器执行的方法步骤。如图11所示,该视觉定位装置可以包括:收发模块111和处理模块112。
处理模块112,用于通过收发模块111获取终端设备采集的图像。该处理模块112,还用于根据该图像获取该图像的二维线特征信息,该二维线特征信息包括建筑物与非建筑物之间的分界线信息,或非建筑物与非建筑物之间的分界线信息中至少一项。该处理模块112,还用于根据该终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及该二维线特征信息,确定该终端设备的定位位姿。
在一些实施例中,该处理模块112用于:对该图像进行语义分割,提取该图像的二维线特征信息。
在一些实施例中,该处理模块112用于:根据该终端设备的位置信息和磁力计角度偏转信息,确定候选位姿集合。根据该候选位姿集合、该二维线特征信息和该卫星地图,确定N个优化位姿。根据该N个优化位姿,确定该终端设备的定位位姿。其中,N为大于1的整数。
在一些实施例中,该处理模块112用于:在该候选位姿集合中选取部分候选位姿,用该部分候选位姿和卫星地图确定该部分候选位姿对应的全景线特征信息,并将该全景线特征信息与该二维线特征信息进行匹配,确定多个初始位姿,该迭代方法用于对该多个初始位姿进行优化,确定该N个优化位姿。
在一些实施例中,该候选位姿集合包括M组候选位姿,每组候选位姿包括候选位置信息和候选偏航角度集合,该候选位置信息属于第一阈值范围内,该第一阈值范围为根据该终端设备的位置信息确定的,该候选偏航角度集合属于第二阈值范围内,该第二阈值范围 为根据该终端设备的磁力计角度偏转信息确定的角度集合,该处理模块用于:
步骤1:在M组候选位姿中选取K 1组候选位姿,分别根据K 1组候选位姿中的每组候选位姿的候选位置信息和卫星地图,获取每组候选位姿的全景线特征信息;
步骤2:分别对每组候选位姿的全景线特征信息与二维线特征信息进行匹配,确定每组候选位姿的候选偏航角度信息,每组候选位姿的候选偏航角度信息为每组候选位姿的候选偏航角度集合中与二维线特征信息匹配度最高的角度;
步骤3:根据K 1组候选位姿的候选偏航角度信息,得到K 1个初始位姿,每个初始位姿包括一组候选位姿的候选位置信息和候选偏航角度信息;
步骤4:对K 1个初始位姿采用迭代方法优化,得到K 1个优化位姿,并得到每个优化位姿对应的最近点损失;
步骤5:根据每个优化位姿的最近点损失,在K 1个优化位姿中确定一个优化位姿,作为N个优化位姿中的一个优化位姿,该一个优化位姿为该K 1个优化位姿中最近点损失最小的优化位姿;
步骤6:将K 1替换为K 1+n,重复执行步骤1至5,直至确定N个优化位姿,n取1至N-1,且K 1>K 2=K 3……=K N
在一些实施例中,K 1+n组候选位姿的中心为对K n组候选位姿执行上述步骤1至5所确定出的一个优化位姿。
在一些实施例中,每个初始位姿还包括预设的高度信息、俯仰角信息和翻滚角信息,每个优化位姿包括位置信息、高度信息、偏航角度信息、俯仰角信息和翻滚角信息。
在一些实施例中,该匹配包括多模态鲁棒匹配或二维轮廓线匹配,其中,该多模态鲁棒匹配包括多重语义信息匹配或极大值抑制匹配。
在一些实施例中,该处理模块112用于:在该N个优化位姿中,选取损失最小的一个优化位姿作为该终端设备的定位位姿。其中,该损失为每个优化位姿对应的差值,和每个优化位姿的最近点损失的加权和,该差值为每个优化位姿的位置信息与该终端设备的位置信息之间的差值。
在一些实施例中,该处理模块112还用于:根据该终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断该终端设备的定位位姿是否可靠。当该终端设备的定位位姿可靠时,输出该终端设备的定位位姿。当该终端设备的定位位姿不可靠时,判定定位失败。其中,该热力图用于表示该部分候选位姿的分布。
在一些实施例中,该处理模块112用于:判断该终端设备的定位位姿是否满足以下条件至少之一:该终端设备的定位位姿对应的内点率大于第一阈值;或者,该终端设备的定位位姿对应的内点误差小于第二阈值;或者,该终端设备的定位位姿对应的热力图中该部分候选位姿的分布的密集度大于第三阈值。
在一些实施例中,该处理模块112还用于:根据该终端设备的定位位姿确定虚拟物体描述信息。通过该收发模块111向该终端设备发送虚拟物体描述信息,该虚拟物体描述信息用于在该终端设备上显示对应的虚拟物体。
本申请实施例提供的视觉定位装置可以用于执行上述视觉定位方法,其内容和效果可参考方法部分,本申请实施例对此不再赘述。
本申请实施例还提供一种视觉定位装置,如图12所示,该视觉定位装置包括处理器 1201和传输接口1202,该传输接口1202用于获取终端设备采集的图像。
传输接口1202可以包括发送接口和接收接口,示例性的,传输接口1202可以为根据任何专有或标准化接口协议的任何类别的接口,例如高清晰度多媒体接口(high definition multimedia interface,HDMI)、移动产业处理器接口(Mobile Industry Processor Interface,MIPI)、MIPI标准化的显示串行接口(Display Serial Interface,DSI)、视频电子标准协会(Video Electronics Standards Association,VESA)标准化的嵌入式显示端口(Embedded Display Port,eDP)、Display Port(DP)或者V-By-One接口,V-By-One接口是一种面向图像传输开发的数字接口标准,以及各种有线或无线接口、光接口等。
该处理器1201被配置为调用存储在存储器中的程序指令,以执行如上述方法实施例的视觉定位方法,其内容和效果可参考方法部分,本申请实施例对此不再赘述。可选的,该装置还包括存储器1203。该处理器1202可以为单核处理器或多核处理器组,该传输接口1202为接收或发送数据的接口,该视觉定位装置处理的数据可以包括音频数据、视频数据或图像数据。示例性的,该视觉定位装置可以为处理器芯片。
本申请实施例另一些实施例还提供一种计算机存储介质,该计算机存储介质可包括计算机指令,当该计算机指令在电子设备上运行时,使得该电子设备执行上述方法实施例中服务器执行的各个步骤。
本申请实施例另一些实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述方法实施例中服务器执行的各个步骤。
本申请实施例另一些实施例还提供一种装置,该装置具有实现上述方法实施例中服务器行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,例如,获取单元或模块,确定单元或模块。
本申请实施例还提供一种视觉定位装置,用于执行以上各方法实施例中终端设备或终端设备的处理器执行的方法步骤。如图13所示,该视觉定位装置可以包括:处理模块131和收发模块132。
处理模块131,用于采集图像,并在用户界面上显示所述图像,该图像包括拍摄到的非建筑物之间的分界线,或,建筑物和非建筑物之间的分界线中至少一项。该处理模块131,还用于通过收发模块132向服务器发送该图像。该收发模块132,还用于接收该服务器发送的虚拟物体描述信息,该虚拟物体描述信息为根据采集该图像的终端设备的定位位姿确定的,该定位位姿为至少根据该图像的二维线特征信息和该终端设备的位置信息确定的,该二维线特征信息包括建筑物与非建筑物之间的分界线的信息,或非建筑物与非建筑物之间的分界线的信息中至少一项。该处理模块131,还用于在该用户界面上叠加显示该虚拟物体描述信息对应的虚拟物体。
在一些实施例中,该处理模块131还用于在采集图像之前,在该用户界面上显示提示信息,该提示信息用于提示用户拍摄建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项。
在一些实施例中,该处理模块131还用于,在发送图像之前,通过端侧模型判断该图像是否适合做视觉定位。
本申请实施例提供的视觉定位装置可以用于执行上述视觉定位方法,其内容和效果可参考方法部分,本申请实施例对此不再赘述。
图14为本申请实施例的一种视觉处理装置的结构示意图。如图14所示,视觉处理装置1400可以是上述实施例中涉及到的终端设备。视觉处理装置1400包括处理器1401和收发器1402。
可选地,视觉处理装置1400还包括存储器1403。其中,处理器1401、收发器1402和存储器1403之间可以通过内部连接通路互相通信,传递控制信号和/或数据信号。
其中,存储器1403用于存储计算机程序。处理器1401用于执行存储器1403中存储的计算机程序,从而实现上述装置实施例中的各功能。
可选地,存储器1403也可以集成在处理器1401中,或者独立于处理器1401。
可选地,视觉处理装置1400还可以包括天线1404,用于将收发器1402输出的信号发射出去。或者,收发器1402通过天线接收信号。
可选地,视觉处理装置1400还可以包括电源1405,用于给终端设备中的各种器件或电路提供电源。
除此之外,为了使得终端设备的功能更加完善,视觉处理装置1400还可以包括输入单元1406、显示单元1407(也可以认为是输出单元)、音频电路1408、摄像头1409和传感器1410等中的一个或多个。音频电路还可以包括扬声器14081、麦克风14082等,不再赘述。
本申请实施例另一些实施例还提供一种计算机存储介质,该计算机存储介质可包括计算机指令,当该计算机指令在电子设备上运行时,使得该电子设备执行上述方法实施例中终端设备执行的各个步骤。
本申请实施例另一些实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述方法实施例中终端设备执行的各个步骤。
本申请实施例另一些实施例还提供一种装置,该装置具有实现上述方法实施例中终端设备行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,例如,采集单元或模块,发送单元或模块,显示单元或模块。
以上各实施例中提及的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory, ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (32)

  1. 一种视觉定位方法,其特征在于,包括:
    获取终端设备采集的图像;
    获取所述图像的二维线特征信息,所述二维线特征信息包括建筑物与非建筑物之间的分界线信息,或非建筑物与非建筑物之间的分界线信息中至少一项;
    根据所述终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及所述二维线特征信息,确定所述终端设备的定位位姿。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述图像的二维线特征信息,包括:
    对所述图像进行语义分割,提取所述图像的二维线特征信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及所述二维线特征信息,确定所述终端设备的定位位姿,包括:
    根据所述终端设备的位置信息和磁力计角度偏转信息,确定候选位姿集合;
    根据所述候选位姿集合、所述二维线特征信息和所述卫星地图,确定N个优化位姿;
    根据所述N个优化位姿,确定所述终端设备的定位位姿;
    其中,N为大于1的整数。
  4. 根据权利要求3所述的方法,其特征在于,所述候选位姿集合包括M组候选位姿,每组候选位姿包括候选位置信息和候选偏航角度集合,所述候选位置信息属于第一阈值范围内,所述第一阈值范围为根据所述终端设备的位置信息确定的,所述候选偏航角度集合属于第二阈值范围内,所述第二阈值范围为根据所述终端设备的磁力计角度偏转信息确定的角度集合,其中,M为大于1的整数。
  5. 根据权利要求3或4所述的方法,其特征在于,所述根据所述候选位姿集合、所述二维线特征信息和所述卫星地图,确定N个优化位姿,包括:
    在所述候选位姿集合中选取部分候选位姿,用所述部分候选位姿和所述卫星地图确定所述部分候选位姿对应的全景线特征信息,并将所述全景线特征信息与所述二维线特征信息进行匹配,确定多个初始位姿;
    对所述多个初始位姿进行优化,确定所述N个优化位姿。
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述候选位姿集合、所述二维线特征信息和所述卫星地图,确定N个优化位姿,包括:
    步骤1:在所述M组候选位姿中选取K 1组候选位姿,分别根据所述K 1组候选位姿中的每组候选位姿的候选位置信息和所述卫星地图,获取每组候选位姿的全景线特征信息;
    步骤2:分别对所述每组候选位姿的全景线特征信息与所述二维线特征信息进行匹配,确定每组候选位姿的候选偏航角度信息,所述每组候选位姿的候选偏航角度信息为每组候选位姿的候选偏航角度集合中与所述二维线特征信息匹配度最高的角度;
    步骤3:根据所述K 1组候选位姿的候选偏航角度信息,得到K 1个初始位姿,每个初始位姿包括一组候选位姿的候选位置信息和候选偏航角度信息;
    步骤4:对所述K 1个初始位姿采用迭代方法优化,得到K 1个优化位姿,并得到每个优化位姿对应的最近点损失;
    步骤5:根据每个优化位姿的最近点损失,在所述K 1个优化位姿中确定一个优化位姿,作为所述N个优化位姿中的一个优化位姿,所述一个优化位姿为所述K 1个优化位姿中最近点损失最小的优化位姿;
    步骤6:将K 1替换为K 1+n,重复执行步骤1至5,直至确定N个优化位姿,n取1至N-1,且K 1>K 2=K 3……=K N
  7. 根据权利要求6所述的方法,其特征在于,K 1+n组候选位姿的中心为对K n组候选位姿执行上述步骤1至5所确定出的一个优化位姿。
  8. 根据权利要求5至7任一项所述的方法,其特征在于,所述每个初始位姿还包括预设的高度信息、俯仰角信息和翻滚角信息,所述每个优化位姿包括位置信息、高度信息、偏航角度信息、俯仰角信息和翻滚角信息。
  9. 根据权利要求5至7任一项所述的方法,其特征在于,所述匹配包括多模态鲁棒匹配或二维轮廓线匹配,其中,所述多模态鲁棒匹配包括多重语义信息匹配或极大值抑制匹配。
  10. 根据权利要求3至9任一项所述的方法,其特征在于,所述根据所述N个优化位姿,确定所述终端设备的定位位姿,包括:
    在所述N个优化位姿中,选取损失最小的一个优化位姿作为所述终端设备的定位位姿;
    其中,所述损失为所述每个优化位姿的最近点损失和所述每个优化位姿对应的差值的加权和,所述差值为所述每个优化位姿的位置信息与所述终端设备的位置信息之间的差值。
  11. 根据权利要求5至10任一项所述的方法,其特征在于,所述方法还包括:
    根据所述终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断所述终端设备的定位位姿是否可靠;
    当所述终端设备的定位位姿可靠时,输出所述终端设备的定位位姿;
    当所述终端设备的定位位姿不可靠时,判定定位失败;
    其中,所述热力图用于表示所述部分候选位姿的分布。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断所述终端设备的定位位姿是否可靠,包括:
    判断所述终端设备的定位位姿是否满足以下条件至少之一:
    所述终端设备的定位位姿对应的内点率大于第一阈值;或者,
    所述终端设备的定位位姿对应的内点误差小于第二阈值;或者,
    所述终端设备的定位位姿对应的热力图中所述部分候选位姿的分布的密集度大于第三阈值。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,所述方法还包括:
    根据所述终端设备的定位位姿确定虚拟物体描述信息;
    向所述终端设备发送所述虚拟物体描述信息,所述虚拟物体描述信息用于在所述终端设备上显示对应的虚拟物体。
  14. 一种视觉定位方法,其特征在于,包括:
    终端设备采集图像,并在所述终端设备的用户界面上显示所述图像,所述图像包括拍摄到的非建筑物之间的分界线,或,建筑物和非建筑物之间的分界线中至少一项;
    向服务器发送所述图像;
    接收所述服务器发送的虚拟物体描述信息,所述虚拟物体描述信息为根据所述终端设备的定位位姿确定的,所述定位位姿为至少根据所述图像的二维线特征信息和所述终端设备的位置信息确定的,所述二维线特征信息包括所述建筑物与非建筑物之间的分界线的信息,或所述非建筑物之间的分界线的信息中至少一项;
    在所述用户界面上叠加显示所述虚拟物体描述信息对应的虚拟物体。
  15. 根据权利要求14所述的方法,其特征在于,采集图像之前,所述方法还包括:
    在所述用户界面上显示提示信息,所述提示信息用于提示用户拍摄建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项。
  16. 根据权利要求14或15所述的方法,其特征在于,发送所述图像之前,所述方法还包括通过端侧模型判断所述图像是否适合做视觉定位。
  17. 一种视觉定位装置,其特征在于,包括:
    处理模块,用于通过收发模块获取终端设备采集的图像;
    所述处理模块,还用于获取所述图像的二维线特征信息,所述二维线特征信息包括建筑物与非建筑物之间的分界线信息,或非建筑物与非建筑物之间的分界线信息中至少一项;
    所述处理模块,还用于根据所述终端设备的位置信息和磁力计角度偏转信息、卫星地图、以及所述二维线特征信息,确定所述终端设备的定位位姿。
  18. 根据权利要求17所述的装置,其特征在于,所述处理模块用于:对所述图像进行语义分割,提取所述图像的二维线特征信息。
  19. 根据权利要求17或18所述的装置,其特征在于,所述处理模块用于:
    根据所述终端设备的位置信息和磁力计角度偏转信息,确定候选位姿集合;
    根据所述候选位姿集合、所述二维线特征信息和所述卫星地图,确定N个优化位姿;
    根据所述N个优化位姿,确定所述终端设备的定位位姿;
    其中,N为大于1的整数。
  20. 根据权利要求19所述的装置,其特征在于,所述候选位姿集合包括M组候选位姿,每组候选位姿包括候选位置信息和候选偏航角度集合,所述候选位置信息属于第一阈值范围内,所述第一阈值范围为根据所述终端设备的位置信息确定的,所述候选偏航角度集合属于第二阈值范围内,所述第二阈值范围为根据所述终端设备的磁力计角度偏转信息确定的角度集合,其中,M为大于1的整数。
  21. 根据权利要求19或20所述的装置,其特征在于,所述处理模块用于:
    在所述候选位姿集合中选取部分候选位姿,用所述部分候选位姿和所述卫星地图确定所述部分候选位姿对应的全景线特征信息,并将所述全景线特征信息与所述二维线特征信息进行匹配,确定多个初始位姿;
    对所述多个初始位姿进行优化,确定所述N个优化位姿。
  22. 根据权利要求20或21所述的装置,其特征在于,所述处理模块用于:
    步骤1:在所述M组候选位姿中选取K 1组候选位姿,分别根据所述K 1组候选位姿中的每组候选位姿的候选位置信息和所述卫星地图,获取每组候选位姿的全景线特征信息;
    步骤2:分别对所述每组候选位姿的全景线特征信息与所述二维线特征信息进行匹配,确定每组候选位姿的候选偏航角度信息,所述每组候选位姿的候选偏航角度信息为每组候 选位姿的候选偏航角度集合中与所述二维线特征信息匹配度最高的角度;
    步骤3:根据所述K 1组候选位姿的候选偏航角度信息,得到K 1个初始位姿,每个初始位姿包括一组候选位姿的候选位置信息和候选偏航角度信息;
    步骤4:对所述K 1个初始位姿采用迭代方法优化,得到K 1个优化位姿,并得到每个优化位姿对应的最近点损失;
    步骤5:根据每个优化位姿的最近点损失,在所述K 1个优化位姿中确定一个优化位姿,作为所述N个优化位姿中的一个优化位姿,所述一个优化位姿为所述K 1个优化位姿中最近点损失最小的优化位姿;
    步骤6:将K 1替换为K 1+n,重复执行步骤1至5,直至确定N个优化位姿,n取1至N-1,且K 1>K 2=K 3……=K N
  23. 根据权利要求22所述的装置,其特征在于,K 1+n组候选位姿的中心为对K n组候选位姿执行上述步骤1至5所确定出的一个优化位姿。
  24. 根据权利要求21至23任一项所述的装置,其特征在于,所述每个初始位姿还包括预设的高度信息、俯仰角信息和翻滚角信息,所述每个优化位姿包括位置信息、高度信息、偏航角度信息、俯仰角信息和翻滚角信息。
  25. 根据权利要求21至23任一项所述的装置,其特征在于,所述匹配包括多模态鲁棒匹配或二维轮廓线匹配,其中,所述多模态鲁棒匹配包括多重语义信息匹配或极大值抑制匹配。
  26. 根据权利要求19至25任一项所述的装置,其特征在于,所述处理模块用于:
    在所述N个优化位姿中,选取损失最小的一个优化位姿作为所述终端设备的定位位姿;
    其中,所述损失包括所述优化位姿的位置信息与所述终端设备的位置信息的差值,和所述优化位姿对应的匹配度。
  27. 根据权利要求21至26任一项所述的装置,其特征在于,所述处理模块还用于:
    根据所述终端设备的定位位姿对应的内点率、内点误差或热力图中至少一项,判断所述终端设备的定位位姿是否可靠;
    当所述终端设备的定位位姿可靠时,输出所述终端设备的定位位姿;
    当所述终端设备的定位位姿不可靠时,判定定位失败;
    其中,所述热力图用于表示所述部分候选位姿的分布。
  28. 根据权利要求27所述的装置,其特征在于,所述处理模块用于:
    判断所述终端设备的定位位姿是否满足以下条件至少之一:
    所述终端设备的定位位姿对应的内点率大于第一阈值;或者,
    所述终端设备的定位位姿对应的内点误差小于第二阈值;或者,
    所述终端设备的定位位姿对应的热力图中所述部分候选位姿的分布的密集度大于第三阈值。
  29. 根据权利要求17至28任一项所述的装置,其特征在于,所述处理模块还用于:
    根据所述终端设备的定位位姿确定虚拟物体描述信息;
    通过所述收发模块向所述终端设备发送所述虚拟物体描述信息,所述虚拟物体描述信息用于在所述终端设备上显示对应的虚拟物体。
  30. 一种视觉定位装置,所述视觉定位方法应用于终端设备,其特征在于,包括:
    处理模块,用于采集图像,并在所述终端设备的用户界面上显示所述图像,所述图像包括拍摄到的非建筑物之间的分界线,或,建筑物和非建筑物之间的分界线中至少一项;
    所述处理模块,还用于通过收发模块向服务器发送所述图像;
    所述收发模块,还用于接收所述服务器发送的虚拟物体描述信息,所述虚拟物体描述信息为根据采集所述图像的终端设备的定位位姿确定的,所述定位位姿为至少根据所述图像的二维线特征信息和所述终端设备的位置信息确定的,所述二维线特征信息包括所述建筑物与非建筑物之间的分界线的信息,或所述非建筑物之间的分界线的信息中至少一项;
    所述处理模块,还用于在所述用户界面上叠加显示所述虚拟物体描述信息对应的虚拟物体。
  31. 根据权利要求30所述的装置,其特征在于,所述处理模块还用于在采集图像之前,在所述用户界面上显示提示信息,所述提示信息用于提示用户拍摄建筑物与非建筑物之间的分界线,或非建筑物与非建筑物之间的分界线中至少一项。
  32. 根据权利要求30或31所述的装置,其特征在于,所述处理模块还用于,在发送图像之前,通过端侧模型判断所述图像是否适合做视觉定位。
PCT/CN2021/084070 2020-05-31 2021-03-30 视觉定位方法和装置 WO2021244114A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21817168.4A EP4148379A4 (en) 2020-05-31 2021-03-30 METHOD AND DEVICE FOR VISUAL POSITIONING
US18/070,862 US20230089845A1 (en) 2020-05-31 2022-11-29 Visual Localization Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010481150.4 2020-05-31
CN202010481150.4A CN113739797A (zh) 2020-05-31 2020-05-31 视觉定位方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/070,862 Continuation US20230089845A1 (en) 2020-05-31 2022-11-29 Visual Localization Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2021244114A1 true WO2021244114A1 (zh) 2021-12-09

Family

ID=78727879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084070 WO2021244114A1 (zh) 2020-05-31 2021-03-30 视觉定位方法和装置

Country Status (4)

Country Link
US (1) US20230089845A1 (zh)
EP (1) EP4148379A4 (zh)
CN (1) CN113739797A (zh)
WO (1) WO2021244114A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862978A (zh) * 2022-03-22 2023-10-10 北京字跳网络技术有限公司 一种定位方法、装置及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100976138B1 (ko) * 2009-09-16 2010-08-16 (주)올라웍스 건축물 이미지의 계층적 매칭 방법, 시스템 및 컴퓨터 판독 가능한 기록 매체
EP2455713A1 (en) * 2010-11-18 2012-05-23 Navteq North America, LLC Building Directory Aided Navigation
CN104809689A (zh) * 2015-05-15 2015-07-29 北京理工大学深圳研究院 一种基于轮廓的建筑物点云模型底图配准方法
CN107633041A (zh) * 2017-09-13 2018-01-26 杭州骑迹科技有限公司 一种信息处理方法、移动终端及信息处理系统
CN109752003A (zh) * 2018-12-26 2019-05-14 浙江大学 一种机器人视觉惯性点线特征定位方法及装置
CN110926474A (zh) * 2019-11-28 2020-03-27 南京航空航天大学 卫星/视觉/激光组合的城市峡谷环境uav定位导航方法
CN111033299A (zh) * 2018-07-02 2020-04-17 北京嘀嘀无限科技发展有限公司 基于点云利用位姿预估的车辆导航系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100674805B1 (ko) * 2005-06-14 2007-01-29 엘지전자 주식회사 카메라 영상과 지도 데이터 간에 건물 매칭 방법
US9317133B2 (en) * 2010-10-08 2016-04-19 Nokia Technologies Oy Method and apparatus for generating augmented reality content
US20130027554A1 (en) * 2011-07-31 2013-01-31 Meadow William D Method and Apparatus for Automated Camera Location and Orientation with Image Processing and Alignment to Ground Based Reference Point(s)
JP2015532077A (ja) * 2012-09-27 2015-11-05 メタイオ ゲゼルシャフト ミット ベシュレンクテル ハフツングmetaio GmbH 少なくとも1つの画像を撮影する撮影装置に関連する装置の位置及び方向の決定方法
US20150371440A1 (en) * 2014-06-19 2015-12-24 Qualcomm Incorporated Zero-baseline 3d map initialization
CN110858414A (zh) * 2018-08-13 2020-03-03 北京嘀嘀无限科技发展有限公司 图像处理方法、装置、可读存储介质与增强现实系统
CN109520500B (zh) * 2018-10-19 2020-10-20 南京航空航天大学 一种基于终端拍摄图像匹配的精确定位及街景库采集方法
CN109901207A (zh) * 2019-03-15 2019-06-18 武汉大学 一种北斗卫星系统与图像特征结合的高精度室外定位方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100976138B1 (ko) * 2009-09-16 2010-08-16 (주)올라웍스 건축물 이미지의 계층적 매칭 방법, 시스템 및 컴퓨터 판독 가능한 기록 매체
EP2455713A1 (en) * 2010-11-18 2012-05-23 Navteq North America, LLC Building Directory Aided Navigation
CN104809689A (zh) * 2015-05-15 2015-07-29 北京理工大学深圳研究院 一种基于轮廓的建筑物点云模型底图配准方法
CN107633041A (zh) * 2017-09-13 2018-01-26 杭州骑迹科技有限公司 一种信息处理方法、移动终端及信息处理系统
CN111033299A (zh) * 2018-07-02 2020-04-17 北京嘀嘀无限科技发展有限公司 基于点云利用位姿预估的车辆导航系统
CN109752003A (zh) * 2018-12-26 2019-05-14 浙江大学 一种机器人视觉惯性点线特征定位方法及装置
CN110926474A (zh) * 2019-11-28 2020-03-27 南京航空航天大学 卫星/视觉/激光组合的城市峡谷环境uav定位导航方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4148379A4

Also Published As

Publication number Publication date
EP4148379A4 (en) 2024-03-27
US20230089845A1 (en) 2023-03-23
EP4148379A1 (en) 2023-03-15
CN113739797A (zh) 2021-12-03

Similar Documents

Publication Publication Date Title
US10311333B2 (en) Terminal device, information processing device, object identifying method, program, and object identifying system
US10834317B2 (en) Connecting and using building data acquired from mobile devices
WO2021093679A1 (zh) 视觉定位方法和装置
US9317133B2 (en) Method and apparatus for generating augmented reality content
CN111046125A (zh) 一种视觉定位方法、系统及计算机可读存储介质
CN109074667B (zh) 基于预测器-校正器的位姿检测
Li et al. Camera localization for augmented reality and indoor positioning: a vision-based 3D feature database approach
WO2019238114A1 (zh) 动态模型三维重建方法、装置、设备和存储介质
WO2018014601A1 (zh) 方位跟踪方法、实现增强现实的方法及相关装置、设备
CN104169965A (zh) 用于多拍摄装置系统中图像变形参数的运行时调整的系统、方法和计算机程序产品
WO2021027692A1 (zh) 视觉特征库的构建方法、视觉定位方法、装置和存储介质
WO2023056544A1 (en) Object and camera localization system and localization method for mapping of the real world
US11694405B2 (en) Method for displaying annotation information, electronic device and storage medium
US20220357159A1 (en) Navigation Method, Navigation Apparatus, Electronic Device, and Storage Medium
US20230089845A1 (en) Visual Localization Method and Apparatus
CN113483771A (zh) 实景地图的生成方法、装置及系统
CN115578432B (zh) 图像处理方法、装置、电子设备及存储介质
Osuna-Coutiño et al. Structure extraction in urbanized aerial images from a single view using a CNN-based approach
WO2023088127A1 (zh) 室内导航方法、服务器、装置和终端
CN113378605A (zh) 多源信息融合方法及装置、电子设备和存储介质
WO2022000856A1 (zh) 测速方法及装置、电子设备及存储介质
US20200389601A1 (en) Spherical Image Based Registration and Self-Localization for Onsite and Offsite Viewing
CN116229209B (zh) 目标模型的训练方法、目标检测方法及装置
CN116229583B (zh) 驱动信息生成、驱动方法、装置、电子设备以及存储介质
CN118038018A (zh) 一种应用于视觉slam系统的目标检测方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817168

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021817168

Country of ref document: EP

Effective date: 20221207

NENP Non-entry into the national phase

Ref country code: DE