WO2023051383A1 - 一种设备定位方法、设备及系统 - Google Patents

一种设备定位方法、设备及系统 Download PDF

Info

Publication number
WO2023051383A1
WO2023051383A1 PCT/CN2022/120592 CN2022120592W WO2023051383A1 WO 2023051383 A1 WO2023051383 A1 WO 2023051383A1 CN 2022120592 W CN2022120592 W CN 2022120592W WO 2023051383 A1 WO2023051383 A1 WO 2023051383A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
user equipment
objects
panoramic
Prior art date
Application number
PCT/CN2022/120592
Other languages
English (en)
French (fr)
Inventor
于莹莹
康一飞
郭昊帅
杨吉年
唐忠伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023051383A1 publication Critical patent/WO2023051383A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes

Definitions

  • the embodiments of the present application relate to the technical field of image information processing, and in particular, to a device positioning method, device, and system.
  • a large number of applications based on virtual reality (VR) and augmented reality (augmented reality, AR) are also continuously pouring into the market, for example: in AR application scenarios, virtual Objects are superimposed on pictures of real scenes, which can be applied to various fields such as games, medical care, education, and navigation.
  • VR virtual reality
  • AR augmented reality
  • determining the spatial pose of the user device becomes the key point.
  • the spatial pose of the user equipment is used to represent the position and posture of the user equipment.
  • the location of the user equipment can be determined based on the global navigation satellite system (GNSS), and the information collected by the motion sensor (such as gyroscope sensor, acceleration sensor and gravity sensor, etc.)
  • the motion data of the user device determines the pose of the user device.
  • GNSS global navigation satellite system
  • the above method can only achieve rough estimation of the position and attitude of the user equipment, and cannot achieve high-precision superposition of virtual objects in real scenes.
  • the present application provides a device positioning method, device and system, which can realize accurate positioning of the spatial pose of user equipment based on low-cost, wide-coverage, and high-precision visual positioning technology.
  • a device positioning method is provided, the method is applied to a device positioning system, and the method includes: first, acquiring a panorama including a plurality of first features (wherein the first feature is a panorama feature) used to represent a plurality of objects Feature Library. And, a second feature of the image captured by the user equipment is extracted. Wherein, the horizontal resolutions of the plurality of first features are consistent, and the horizontal resolutions of the second features are consistent. Then, search in the panoramic feature library to determine the first feature that matches the second feature; finally, according to the first feature that matches the second feature, determine the spatial pose of the user device when capturing the above image, wherein the spatial pose It is used to represent the position and attitude of the user equipment.
  • the above panorama feature library includes multiple panorama features for representing various things (ie objects) in the city, such as buildings, parks, bridges, roads, lawns, squares, street lights, road signs, squares, rivers, mountains, etc. wait.
  • the device positioning system uses the panorama feature library used to represent multiple first features of multiple objects (wherein the first feature is a panorama feature) as a retrieval library, when determining that the user equipment acquires an image In space pose, the second feature of the image captured by the user equipment (wherein the second feature is a feature with consistent horizontal resolution) can be extracted, and searched in the panorama feature library to determine the first feature matching the second feature. Then, based on the first feature matched with the second feature, the spatial pose of the user equipment when capturing the image is determined.
  • the panorama feature library used to represent multiple first features of multiple objects (wherein the first feature is a panorama feature) as a retrieval library
  • This solution achieves low cost and wide coverage (for example, it does not rely on artificially designed feature points, does not need to pre-set sliced image retrieval features and additional precision navigation processor (PnP) processing, and can be applied to skyline features. obvious scenes, etc.), high-precision user equipment positioning.
  • PnP precision navigation processor
  • the acquisition of the panoramic feature library may specifically include: constructing the panoramic feature library based on a real-scene three-dimensional model used to describe the spatial information of multiple objects.
  • the device positioning system can construct a panoramic feature library for representing multiple first features (wherein the first feature is a panoramic feature) of multiple objects based on a real-scene 3D model as a retrieval library. This method is low in cost and easy. It can provide a more accurate reference for device positioning.
  • the above-mentioned construction of the panoramic feature library based on the real-scene 3D model includes: obtaining the panoramic feature library by semantically classifying the real-scene 3D model, extracting and rendering preset types of objects, and encoding panoramic features.
  • the device positioning system can obtain the object classification described by the real-scene 3D model by semantically classifying the real-scene 3D model; and obtain the information used to describe the preset type of objects in the real-scene 3D model by extracting objects of a preset type;
  • a panoramic feature library including one or more preset types of objects is obtained through rendering and panoramic feature encoding.
  • a panorama feature library for representing multiple first features of multiple objects (wherein the first feature is a panorama feature) can be constructed as a retrieval library, which is low in cost, easy to implement, and can provide more accurate references for device positioning.
  • the panoramic feature library is constructed based on the real-scene 3D model, including: semantically classifying the real-scene 3D model to obtain the types of multiple objects described by the real-scene 3D model; Extract one or more preset types of objects from the described multiple objects; grid the extracted one or more preset types of objects; render one or more preset types of objects grid by grid to obtain a rendered image ; Expand the cylinder of the rendered image to obtain a panorama; encode one or more preset types of object panorama features in the panorama grid by grid to obtain a panorama feature library.
  • the device positioning system can obtain the types of multiple objects described by the real-scene 3D model by semantically classifying the real-scene 3D model; by extracting objects of preset types, the real-scene 3D model can be used to describe preset types of objects information; improve the accuracy of subsequent rendering and panoramic feature encoding by meshing the extracted one or more preset types of objects; obtain one or more preset types of objects through rendering, cylindrical expansion, and panoramic feature encoding Panoramic feature library for objects.
  • a panorama feature library for representing multiple first features of multiple objects (wherein the first feature is a panorama feature) can be constructed as a retrieval library, which is low in cost, easy to implement, and can provide more accurate references for device positioning.
  • the above-mentioned meshing the extracted one or more preset types of objects includes: meshing the extracted one or more preset types of objects according to a fixed interval or a dynamic interval.
  • the device positioning system can grid the extracted one or more preset types of objects according to fixed intervals or dynamic intervals, so as to adjust the sampling density according to actual needs, so as to obtain the panorama feature library accuracy that meets actual needs .
  • fixed intervals and dynamic intervals can be set empirically by algorithms or software developers.
  • the above method further includes: setting a dynamic interval according to the importance of regions in the real-scene three-dimensional model, and/or the importance of the type to which one or more preset types of objects belong. Based on this, the sampling density can be adjusted according to actual needs to obtain the accuracy of the panorama feature library that meets the actual needs.
  • the image captured by the user equipment is a first image
  • the above method further includes: preprocessing the first image to obtain the second image ;
  • the preprocessing includes one or more of the following: initializing the spatial attitude, adjusting the brightness of the image to a preset brightness, adjusting the contrast of the image to a preset contrast, semantically classifying the objects described in the image, and cylindrical projection . Based on this, the accuracy of subsequent matching of the first feature and the second feature can be further improved.
  • the extracting the second feature of the image captured by the user equipment includes: performing panoramic feature encoding on one or more objects described in the second image to obtain the second feature of the second image.
  • panoramic feature encoding is used to extract horizontal resolution consistent features of objects in the image.
  • the above search in the panoramic feature library to determine the first feature matching the second feature includes: sliding the second feature in the panoramic feature library, and determining the second feature and the sliding window The matching similarity of multiple first features within the range; according to the multiple matching similarities between the second feature and the multiple first features in the sliding window range, determine the first feature that matches the second feature.
  • the first feature matching the second feature may be determined by using the calculated matching similarity between the second feature and multiple first features by using the panoramic sliding window technique.
  • the first feature matched with the second feature is the first feature corresponding to the highest matching similarity among multiple matching similarities.
  • it may be determined that the first feature corresponding to the highest matching similarity is the first feature matching the second feature according to the obtained matching similarities between the second feature and multiple first features.
  • the above-mentioned search in the panoramic feature database to determine the first feature matching the second feature includes: searching within the entire scope of the panoramic feature database to determine the first feature matching the second feature A feature; or, search within a preset range of the panorama feature library to determine the first feature that matches the second feature.
  • the panoramic sliding window technology can be implemented based on the full range of the panoramic feature library or the preset range of the panoramic feature library, and can be adaptively adjusted according to actual needs. This method is convenient, fast, and can save computing power to the greatest extent.
  • the above method further includes: determining the preset range according to the location of the user equipment when the user equipment captures the image, in combination with a setting rule of the preset range.
  • the location of the user device is set to a preset range, and then the sliding window is used for feature matching, so that device positioning can be realized conveniently and quickly, and computing power can be saved to the greatest extent.
  • the preset range is a circular area centered at the location of the user equipment when the user equipment captures the image and having a radius of r, where r is a positive number.
  • the location of the user device is set to a preset range, and then the sliding window is used for feature matching, so that device positioning can be realized conveniently and quickly, and computing power can be saved to the greatest extent.
  • the above preset range includes the first range and the second range, and the priority of the first range is higher than the second range; then search within the preset range of the panorama feature library to determine the The first feature matching the two features includes: searching in the first range; if the first feature matching the second feature is not found in the first range, then searching in the second range to determine the matching with the second feature The first feature to match.
  • the above-mentioned first range is a circular area centered at the location of the user equipment when the user equipment captures the image, with r1 as the radius, where r1 is a positive number; the first range is When the user equipment captures an image, the location of the user equipment is taken as the center, and r1 is an inner diameter and r2 is an outer diameter of a ring area, wherein r1 and r2 are positive numbers, and r1 is smaller than r2.
  • the horizontal resolution of the second feature is the same as the horizontal resolution of the first feature.
  • the real-scene three-dimensional model includes, but is not limited to, one or more of the following: an aerial three-dimensional real-scene model, a satellite real-scene three-dimensional model, and a city information model.
  • a device positioning method is provided.
  • the method is applied to a first device (such as a cloud-side device), and the method includes: the first device constructs a panorama feature library based on a real-scene 3D model.
  • the real-scene 3D model is used to describe the spatial information of multiple objects;
  • the panorama feature library includes multiple first features used to represent multiple objects, and the horizontal resolutions of the multiple first features are consistent.
  • the above panorama feature library includes multiple panorama features for representing various things (ie objects) in the city, such as buildings, parks, bridges, roads, lawns, squares, street lights, road signs, squares, rivers, mountains, etc. wait.
  • the first device uses the panoramic feature library used to represent multiple first features of multiple objects (wherein the first feature is a panoramic feature) as a retrieval library, and determines the user equipment when capturing images.
  • the first feature is a panoramic feature
  • the first device uses the panoramic feature library used to represent multiple first features of multiple objects (wherein the first feature is a panoramic feature) as a retrieval library, and determines the user equipment when capturing images.
  • the first feature is a panoramic feature
  • PnP precision navigation processor
  • the above-mentioned first device constructs the above-mentioned panoramic feature library based on the real-scene 3D model, including: the first device extracts, renders, and encodes panoramic features by semantically classifying the real-scene 3D model. , to get the panorama feature library.
  • the first device may obtain the object classification described by the real-scene 3D model by semantically classifying the real-scene 3D model; and obtain the information used to describe the preset type of object in the real-scene 3D model by extracting objects of a preset type;
  • a panoramic feature library including one or more preset types of objects is obtained through rendering and panoramic feature encoding.
  • a panorama feature library for representing multiple first features of multiple objects (wherein the first feature is a panorama feature) can be constructed as a retrieval library, which is low in cost, easy to implement, and can provide more accurate references for device positioning.
  • the above-mentioned first device constructs a panoramic feature library based on the real-scene 3D model, including: the first device semantically classifies the real-scene 3D model, and obtains the types of multiple objects described by the real-scene 3D model ; The first device extracts one or more preset types of objects from the multiple objects described by the real-scene 3D model; the first device meshes the extracted one or more preset types of objects; the first device meshes grid to render one or more preset types of objects to obtain a rendered image; the first device expands the cylinder of the rendered image to obtain a panorama; the first device grid-by-grid to one or more preset types of objects in the panorama Panoramic feature encoding to obtain a panoramic feature library.
  • the first device may obtain the types of multiple objects described by the real-scene 3D model by semantically classifying the real-scene 3D model; and obtain the real-scene 3D model by extracting preset types of objects for describing preset types of objects information; improve the accuracy of subsequent rendering and panoramic feature encoding by meshing the extracted one or more preset types of objects; obtain one or more preset types of objects through rendering, cylindrical expansion, and panoramic feature encoding Panoramic feature library for objects.
  • a panorama feature library for representing multiple first features of multiple objects (wherein the first feature is a panorama feature) can be constructed as a retrieval library, which is low in cost, easy to implement, and can provide more accurate references for device positioning.
  • the above-mentioned first device grids the extracted one or more preset types of objects, including: the first device divides the extracted one or more preset types at fixed intervals or dynamic intervals object meshing.
  • the first device may grid the extracted one or more preset types of objects at fixed intervals or dynamic intervals, so as to adjust the sampling density according to actual needs, so as to obtain the panorama feature library accuracy that meets actual needs .
  • fixed intervals and dynamic intervals can be set empirically by algorithms or software developers.
  • the above method further includes: the first device sets the dynamic interval according to the importance of regions in the real-scene three-dimensional model, and/or the importance of the type to which one or more preset types of objects belong. Based on this, the sampling density can be adjusted according to actual needs to obtain the accuracy of the panorama feature library that meets the actual needs.
  • the real-scene three-dimensional model includes, but is not limited to, one or more of the following: an aerial three-dimensional real-scene model, a satellite real-scene three-dimensional model, and a city information model.
  • a method for locating a device is provided, the method is applied to a second device (such as a user device), and the method includes: the second device extracts a second feature of an image captured by the user device. Wherein, the horizontal resolutions of the plurality of first features are consistent, and the horizontal resolutions of the second features are consistent. Then, search in the panoramic feature library to determine the first feature that matches the second feature; finally, according to the first feature that matches the second feature, determine the spatial pose of the user device when capturing the above image, wherein the spatial pose It is used to indicate the position and attitude of the second device (such as user equipment).
  • the second device when determining the spatial pose of the user equipment when capturing images, may extract the second feature of the image captured by the user equipment (wherein the second feature is a consistent horizontal resolution feature), in the A panoptic feature library is searched to determine a first feature that matches a second feature. Then, based on the first feature matched with the second feature, the spatial pose of the user equipment when capturing the image is determined.
  • This solution achieves low cost, wide coverage (for example, it does not rely on artificially designed feature points, does not need to pre-set sliced image retrieval features and additional PnP processing, and can be applied to scenes where skyline features are not obvious, etc.), high-precision user Device targeting.
  • the image captured by the user equipment is a first image
  • the above method further includes: the second device pre-selects the first image Processing to obtain the second image; wherein, the preprocessing includes one or more of the following: initializing the spatial attitude, adjusting the brightness of the image to a preset brightness, adjusting the contrast of the image to a preset contrast, and describing the object semantics to the image classification, cylindrical projection. Based on this, the accuracy of subsequent matching of the first feature and the second feature can be further improved.
  • the second device extracting the second feature of the image captured by the user device includes: the second device performs panoramic feature encoding on one or more objects described in the second image to obtain the second image the second characteristic.
  • panoramic feature encoding is used to extract horizontal resolution consistent features of objects in the image.
  • the above-mentioned second device searches in the panoramic feature library to determine the first feature that matches the second feature, including: the second device slides the second feature in the panoramic feature library to determine The matching similarity between the second feature and multiple first features within the sliding window range; the second device determines the matching similarity between the second feature and multiple first features within the sliding window range to match the second feature first feature.
  • the first feature matching the second feature may be determined by using the calculated matching similarity between the second feature and multiple first features by using the panoramic sliding window technique.
  • the first feature matched with the second feature is the first feature corresponding to the highest matching similarity among multiple matching similarities.
  • it may be determined that the first feature corresponding to the highest matching similarity is the first feature matching the second feature according to the obtained matching similarities between the second feature and multiple first features.
  • the above-mentioned second device searches in the panoramic feature database to determine the first feature that matches the second feature, including: the second device searches in the entire scope of the panoramic feature database to determine the first feature that matches the second feature. The first feature matched by the second feature; or, the second device searches within a preset range of the panorama feature library to determine the first feature matched by the second feature.
  • the panoramic sliding window technology can be implemented based on the full range of the panoramic feature library or the preset range of the panoramic feature library, and can be adaptively adjusted according to actual needs. This method is convenient, fast, and can save computing power to the greatest extent.
  • the above method further includes: the second device determines the preset range according to the location of the user equipment when the user equipment captures the image and in combination with the setting rule of the preset range.
  • the location of the user device is set to a preset range, and then the sliding window is used for feature matching, so that device positioning can be realized conveniently and quickly, and computing power can be saved to the greatest extent.
  • the preset range is a circular area centered at the location of the user equipment when the user equipment captures the image and having a radius of r, where r is a positive number.
  • the location of the user device is set to a preset range, and then the sliding window is used for feature matching, so that device positioning can be realized conveniently and quickly, and computing power can be saved to the greatest extent.
  • the above preset range includes the first range and the second range, and the priority of the first range is higher than the second range; then the second device searches within the preset range of the panorama feature library to Determining the first feature matching the second feature includes: searching by the second device within the first range; if no first feature matching the second feature is found within the first range, searching within the second range, to determine the first feature that matches the second feature.
  • the above-mentioned first range is a circular area centered at the location of the user equipment when the user equipment captures the image, with r1 as the radius, where r1 is a positive number; the first range is When the user equipment captures an image, the location of the user equipment is taken as the center, and r1 is an inner diameter and r2 is an outer diameter of a ring area, wherein r1 and r2 are positive numbers, and r1 is smaller than r2.
  • the horizontal resolution of the second feature is the same as the horizontal resolution of the first feature.
  • a first device such as a cloud-side device
  • the first device includes: a processing unit configured to construct a panoramic feature library based on a real-scene 3D model.
  • the real-scene 3D model is used to describe the spatial information of multiple objects;
  • the panorama feature library includes multiple first features used to represent multiple objects, and the horizontal resolutions of the multiple first features are consistent.
  • the first device uses the panoramic feature library used to represent multiple first features of multiple objects (wherein the first feature is a panoramic feature) as a retrieval library, and determines the user equipment when capturing images.
  • the first feature is a panoramic feature
  • the first device uses the panoramic feature library used to represent multiple first features of multiple objects (wherein the first feature is a panoramic feature) as a retrieval library, and determines the user equipment when capturing images.
  • the first feature is a panoramic feature
  • PnP precision navigation processor
  • the above-mentioned processing unit is specifically configured to obtain a panoramic feature library by semantically classifying the real-scene 3D model, extracting and rendering preset types of objects, and encoding panoramic features.
  • the first device may obtain the object classification described by the real-scene 3D model by semantically classifying the real-scene 3D model; and obtain the information used to describe the preset type of object in the real-scene 3D model by extracting objects of a preset type;
  • a panoramic feature library including one or more preset types of objects is obtained through rendering and panoramic feature encoding.
  • the above-mentioned processing unit is specifically configured to: semantically classify the real-scene 3D model to obtain the types of multiple objects described by the real-scene 3D model; one or more preset types of objects; meshing the extracted one or more preset types of objects; rendering one or more preset types of objects grid by grid to obtain a rendering image; and, rendering the graph column Unfold the surface to obtain a panorama; the first device encodes one or more preset types of object panorama features in the panorama grid by grid to obtain a panorama feature library.
  • the first device may obtain the types of multiple objects described by the real-scene 3D model by semantically classifying the real-scene 3D model; and obtain the real-scene 3D model by extracting preset types of objects for describing preset types of objects information; improve the accuracy of subsequent rendering and panoramic feature encoding by meshing the extracted one or more preset types of objects; obtain one or more preset types of objects through rendering, cylindrical expansion, and panoramic feature encoding Panoramic feature library for objects.
  • a panorama feature library for representing multiple first features of multiple objects (wherein the first feature is a panorama feature) can be constructed as a retrieval library, which is low in cost, easy to implement, and can provide more accurate references for device positioning.
  • the above-mentioned processing unit is specifically configured to: mesh the extracted one or more preset types of objects according to a fixed interval or a dynamic interval.
  • the first device may grid the extracted one or more preset types of objects at fixed intervals or dynamic intervals, so as to adjust the sampling density according to actual needs, so as to obtain the panorama feature library accuracy that meets actual needs .
  • fixed intervals and dynamic intervals can be set empirically by algorithms or software developers.
  • the above-mentioned processing unit is further configured to: set the dynamic interval according to the importance of regions in the real-scene three-dimensional model, and/or the importance of the type to which one or more preset types of objects belong. Based on this, the sampling density can be adjusted according to actual needs to obtain the accuracy of the panorama feature library that meets the actual needs.
  • the real-scene three-dimensional model includes, but is not limited to, one or more of the following: an aerial three-dimensional real-scene model, a satellite real-scene three-dimensional model, and a city information model.
  • a second device (such as a user device) is provided.
  • the second device includes: a processing unit configured to extract a second feature of an image captured by the second device; The first feature matched with the second feature; and, according to the first feature matched with the second feature, determine the spatial pose of the user equipment when capturing the image.
  • the horizontal resolutions of the above-mentioned multiple first features are consistent
  • the horizontal resolutions of the second features are consistent
  • the spatial pose is used to represent the position and posture of the second device.
  • the second device when determining the spatial pose of the user equipment when the image is captured, may extract the second feature of the image captured by the user equipment (where the second feature is a consistent horizontal resolution feature), in A panoptic feature library is searched to determine a first feature that matches a second feature. Then, based on the first feature matched with the second feature, the spatial pose of the user equipment when capturing the image is determined.
  • This solution achieves low cost, wide coverage (for example, it does not rely on artificially designed feature points, does not need to pre-set sliced image retrieval features and additional PnP processing, and can be applied to scenes where skyline features are not obvious, etc.), high-precision user Device targeting.
  • the above-mentioned second device further includes: an image capturing unit, configured to capture the first image; the above-mentioned processing unit is also configured to: perform preprocessing on the first image to obtain a second image; wherein, the preprocessing
  • the processing includes one or more of the following: initializing the spatial attitude, adjusting the brightness of the image to a preset brightness, adjusting the contrast of the image to a preset contrast, semantically classifying objects described in the image, and cylindrical projection. Based on this, the accuracy of subsequent matching of the first feature and the second feature can be further improved.
  • the above processing unit is specifically configured to: perform panoramic feature encoding on one or more objects described in the second image to obtain the second feature of the second image.
  • panoramic feature encoding is used to extract horizontal resolution consistent features of objects in the image.
  • the above processing unit is specifically configured to: slide the second feature into the panoramic feature library, and calculate the matching similarity between the second feature and multiple first features within the sliding window range; and, according to Multiple matching similarities between the second feature and the multiple first features in multiple sliding window ranges determine the first feature that matches the second feature.
  • the first feature matching the second feature may be determined by using the calculated matching similarity between the second feature and multiple first features by using the panoramic sliding window technique.
  • the upper processing unit is specifically configured to: search within the entire scope of the panoramic feature database to determine the first feature that matches the second feature; or, search within the preset range of the panoramic feature database , to determine the first feature that matches the second feature.
  • the panoramic sliding window technology can be implemented based on the full range of the panoramic feature library or the preset range of the panoramic feature library, and can be adaptively adjusted according to actual needs. This method is convenient, fast, and can save computing power to the greatest extent.
  • the above-mentioned second device further includes: a location detecting unit, configured to acquire location information of the user equipment when the image capturing unit captures an image.
  • the above processing unit is further configured to: determine the preset range according to the location of the user equipment when the image capture unit captures the image and in combination with the setting rule of the preset range.
  • the location of the user device is set to a preset range, and then the sliding window is used for feature matching, so that device positioning can be realized conveniently and quickly, and computing power can be saved to the greatest extent.
  • a first device includes: a memory for storing a computer program; a transceiver for receiving or sending a radio signal; a processor for executing the computer program, so that the first The device implements the method in any possible implementation manner of the second aspect.
  • a second device in a seventh aspect, includes: a memory for storing a computer program; a transceiver for receiving or sending a radio signal; a processor for executing the computer program, so that the first The device implements the method in any possible implementation manner of the third aspect.
  • the eighth aspect provides a device positioning system, the system includes the first device in any possible implementation manner of the fourth aspect or the sixth aspect; and any possible implementation manner of the fifth aspect or the seventh aspect the second device in mode.
  • the device locating system is used to implement the method in any possible implementation manner of the first aspect.
  • a computer-readable storage medium is provided.
  • Computer program code is stored on the computer-readable storage medium.
  • the processor implements any one of the second aspect or the third aspect. method in one possible implementation.
  • a chip system in a tenth aspect, includes a processor and a memory, and computer program code is stored in the memory; when the computer program code is executed by the processor, the processor implements the second aspect or the first A method in any of the possible implementations of the three aspects.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • a computer program product comprising computer instructions.
  • the computer instructions When the computer instructions are run on the computer, the computer is made to implement the method in any possible implementation manner of the second aspect or the third aspect.
  • FIG. 1 is a schematic diagram of a hardware structure of a user equipment provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a software structure of a user equipment provided in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a hardware structure of a cloud-side device provided by an embodiment of the present application.
  • FIG. 4 is a flow chart of a device positioning method provided in an embodiment of the present application.
  • FIG. 5A is an example diagram of a real-scene three-dimensional model provided by the embodiment of the present application.
  • FIG. 5B is a schematic modal diagram of a panorama provided by the embodiment of the present application.
  • FIG. 6 is an example diagram of an image captured by a user equipment provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a spatial location of a user equipment provided in an embodiment of the present application.
  • FIG. 8 is an interactive flowchart of a device positioning method provided by an embodiment of the present application.
  • FIG. 9 is an interactive example diagram of a device positioning method provided by an embodiment of the present application.
  • FIG. 10 is an example diagram of an image preprocessing effect provided by the embodiment of the present application.
  • FIG. 11 is an example diagram of a preset range provided by the embodiment of the present application.
  • FIG. 12 is an example diagram of another preset range provided by the embodiment of the present application.
  • FIG. 13 is an example diagram of a feature matching process provided by the embodiment of the present application.
  • FIG. 14 is an example diagram of an AR application effect provided by an embodiment of the present application.
  • FIG. 15 is an example diagram of another AR application effect provided by the embodiment of the present application.
  • FIG. 16 is a structural block diagram of a user equipment provided in an embodiment of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of this embodiment, unless otherwise specified, “plurality” means two or more.
  • An embodiment of the present application provides a device positioning method, which is used to determine the spatial pose of the device (including the position and posture of the device).
  • a device positioning method provided in the embodiment of this application can be applied to AR application scenarios in the fields of games, medical care, education, navigation, entertainment, etc., such as AR maps, AR navigation, AR billboards, and AR holographic information Display, AR fusion of virtual reality and photography, etc.
  • the superimposed position and orientation of a virtual object in a real scene need to be determined based on the spatial pose of the device. Therefore, the accuracy of device space pose is crucial to user experience. For example, in AR navigation, if the calculation of the spatial pose of the device is inaccurate, the superimposed position of the virtual navigation arrow on the map will deviate, so the navigation experience brought to the user is poor.
  • the spatial pose of the device can be determined based on the image captured by the user device through visual positioning technology .
  • the spatial pose of the device can be determined based on methods such as artificially designed feature points, feature descriptor retrieval, and skyline feature matching.
  • the basic principle of determining the spatial pose of the device based on artificially designed feature points is: artificially arrange visual feature points from the real world in advance according to actual needs, and compare the feature points in the image captured by the user through the device with the artificially arranged visual features. Point to determine whether the device is in the preset space pose.
  • the basic principle of determining the space pose of the device based on the feature descriptor retrieval method is to extract the feature descriptors in the image captured by the user through the device, match them in the preset sliced image retrieval feature library, and then pass the precise navigation processor ( precision navigation processor, PnP) to determine the device space pose.
  • precise navigation processor precision navigation processor, PnP
  • the basic principle of determining the space pose of the device based on the skyline feature matching method is to extract the skyline features in the image captured by the user through the device, and perform matching in the skyline feature library to determine the space pose of the device.
  • the skyline is an important form of expression of the urban shape outline (such as the outline of a building), and the skyline reflects the three-dimensional spatial hierarchical characteristics.
  • the method based on artificially designed feature points can only be applied to specific and limited scenarios because the visual feature points are artificially pre-arranged, and it is difficult to apply on a large scale.
  • the method based on feature descriptor retrieval Since the feature library stores images in the form of slices, there will be a lot of image redundancy, which increases storage space and query time during retrieval; and the positioning accuracy of this method depends on retrieval and PnP Processing the performance of the two links requires high image precision and high drawing costs.
  • the method based on skyline feature matching cannot accurately locate in scenes where the skyline features are not obvious (such as scenes with unclear and incomplete outlines).
  • the embodiment of the present application provides a equipment positioning method, which can be based on low-cost, wide-coverage , High-precision visual positioning technology to achieve accurate positioning of the spatial pose of the user equipment.
  • the position of the user in the real scene may be used according to the determination result of the spatial pose of the user equipment.
  • the visual angle superimposes virtual objects (such as virtual navigation arrows, etc.) on the real scene with accurate positions and orientations, bringing users a better AR experience.
  • the user equipment supports an image capture function and a display function.
  • Image capture functions such as camera functions, camera functions, and the like.
  • the image may be a picture obtained by the user equipment by taking a photo.
  • the image may also be an image frame in a video captured by the user equipment by taking a camera, which is not limited in this application.
  • the user equipment supports displaying images or videos captured by the user equipment through a display function.
  • the user equipment supports displaying AR/VR videos through a display function, for example, a video in which virtual characters, virtual icons, etc. are superimposed in a real scene.
  • user equipment may include, but not limited to, smart phones, netbooks, tablet computers, smart glasses, smart watches, smart bracelets, phone watches, smart cameras, personal computers (personal computers, PCs), supercomputers, palmtop computers , AR/VR equipment, personal digital assistant (personal digital assistant, PDA), portable multimedia player (portable multimedia player, PMP), session initiation protocol (session initiation protocol, SIP) phone, internet of things (IOT) devices, wireless devices in a smart city, wireless devices in a smart home, or somatosensory game consoles, etc.
  • the user equipment may also have other structures and/or functions, which are not limited in this application.
  • FIG. 1 shows a schematic diagram of a hardware structure of a user equipment provided in an embodiment of the present application.
  • the user equipment may include a processor 110, a memory (including an external memory interface 120 and an internal memory 121), a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, and a power management module 141 , battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192 , camera assembly 193, display screen 194, etc.
  • a processor 110 may include a processor 110, a memory (including an external memory interface 120 and an internal memory 121), a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, and a power management module 141 , battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver
  • the sensor module 180 may include a gyroscope sensor, an acceleration sensor, a magnetic sensor, a touch sensor, a fingerprint sensor, a pressure sensor, an air pressure sensor, a distance sensor, a proximity light sensor, a temperature sensor, an ambient light sensor, a bone conduction sensor, and the like.
  • the structure shown in the embodiment of the present invention does not constitute a specific limitation on the user equipment.
  • the user equipment may include more or fewer components than shown in the figure, or combine some components, or separate some components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • Processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a flight controller, Video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • graphics processing unit graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • flight controller Video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, and/or universal serial bus (universal serial bus, USB ) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • USB universal serial bus
  • the charging management module 140 is configured to receive a charging input from a charger.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 194 , the camera component 193 , and the wireless communication module 160 .
  • the wireless communication function of the user equipment can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the user equipment can be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G/6G applied on user equipment.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194 .
  • the modem processor may be a stand-alone device. In some other embodiments, the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the user equipment including wireless local area networks (wireless local area networks, WLAN) (such as Wi-Fi network), Bluetooth BT, global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency) modulation, FM), near field communication (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the user equipment is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the user equipment can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), new wireless (new radio, NR ), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc.
  • GSM global system for mobile communications
  • general packet radio service general packet radio service
  • CDMA code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • LTE long term evolution
  • new wireless new radio,
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the user equipment may collect location information of the user equipment through GNSS, such as collecting longitude and latitude information of the user equipment.
  • the user equipment realizes the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the user equipment can render the image captured by the user equipment through the GPU, and the virtual objects (such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.) in the image of the real scene Overlay rendering, etc.
  • the virtual objects such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode (AMOLED), flexible light-emitting diode (FLED), MiniLED, MicroLED, Micro-OLED, quantum dot light emitting diodes (QLED), etc.
  • the user equipment may include 1 or K display screens 194, where K is a positive integer greater than 1.
  • the user equipment can display the image captured by the user equipment through the display screen 194, showing that virtual objects (such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.) are superimposed in the real scene AR image.
  • virtual objects such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.
  • the user equipment can realize the shooting function through the ISP, the camera component 193, the video codec, the GPU, the display screen 194, and the application processor.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, a solid state drive, etc., to expand the storage capacity of the user device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the user equipment.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the processor 110 executes various functional applications and data processing of the user equipment by executing instructions stored in the internal memory 121 and/or instructions stored in the memory provided in the processor.
  • the user equipment may implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, and the application processor. Such as music playback, recording, etc.
  • audio module 170 the speaker 170A, the receiver 170B and the microphone 170C
  • the keys 190 include a power key, a volume key and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the user equipment can receive key input and generate key signal input related to user settings and function control of the user equipment.
  • the hardware modules included in the user equipment shown in FIG. 1 are only described as examples, and do not limit the specific structure of the user equipment.
  • the user equipment may further include a subscriber identity module (subscriber identity module, SIM) interface.
  • SIM subscriber identity module
  • the user equipment may also include components such as a keyboard and a mouse.
  • the operating system of the user equipment may include but not limited to and other operating systems.
  • the software of the user equipment can be divided into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the software structure of the user equipment can be divided into three layers from top to bottom: the application program layer (abbreviated as the application layer), the application program framework layer (referred to as the framework layer), the system library, the Android runtime and the kernel layer ( Also known as the driver layer).
  • the application layer may include a series of application packages, such as camera, gallery, calendar, call, map, navigation, Bluetooth, music, video, short message, AR application and other applications.
  • the application program is referred to as application for short below.
  • the AR application can support the user equipment to provide the user with a virtual-real fusion experience in the AR scene.
  • AR applications can be AR maps, AR navigation, AR billboards, AR holographic information display, AR virtual and real fusion photography, Hetu and other applications.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer may include a window manager service (window manager service, WMS), an activity management server (activity manager service, AMS) and an input event management server (input manager service, IMS).
  • the application framework layer may also include a content provider, a view system, a phone manager, a resource manager, a notification manager, etc. (not shown in FIG. 2 ).
  • the system library and Android runtime include the functional functions that the framework layer needs to call, the Android core library, and the Android virtual machine.
  • a system library can include multiple function modules. For example: browser kernel, three-dimensional (3dimensional, 3D) graphics, font library, etc.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer can include display drivers, input/output device drivers (for example, keyboards, touch screens, earphones, speakers, microphones, etc.), device nodes, camera drivers, audio drivers, and sensor drivers.
  • input/output device drivers can detect user input events. For example, a microphone can detect speech from a user.
  • Figure 2 only uses the layered architecture Taking the system as an example, a software structure of a user equipment is introduced.
  • the present application does not limit the specific architecture of the software system of the user equipment, and for the specific introduction of software systems of other architectures, reference may be made to conventional technologies.
  • the determination of the spatial pose of the user equipment can be completed by the user equipment (such as a smartphone), or by a cloud-side device (such as a server), or by a user equipment (such as a smartphone) and a cloud side devices (such as servers) to complete.
  • the user equipment such as a smartphone
  • a cloud-side device such as a server
  • a user equipment such as a smartphone
  • a cloud side devices such as servers
  • FIG. 3 shows a schematic diagram of a hardware structure of a cloud-side device provided by an embodiment of the present application.
  • the cloud-side device may include a processor 301, a communication line 302, a memory 303, and at least one communication interface (in FIG. 3, the communication interface 304 is used as an example for illustration).
  • the processor 301 may include one or more processors, where the processor may be a CPU, a microprocessor, a specific ASIC, or other integrated circuits, without limitation.
  • Communication line 302 may include a path for communicating information between the aforementioned components.
  • the communication interface 304 is used for communicating with other devices or communication networks.
  • the memory 303 can be ROM or RAM, or EEPROM, CD-ROM, or other optical disk storage, optical disk storage (including compact optical disk, laser disk, optical disk, digital versatile optical disk, Blu-ray optical disk, etc.), magnetic disk storage medium or other magnetic storage devices, Or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory may exist independently and be connected to the processor through the communication line 302 .
  • Memory can also be integrated with the processor.
  • the memory 303 is used to store computer programs.
  • the processor 301 is configured to execute the computer program stored in the memory 303, so as to implement the method provided in any of the following method embodiments of the present application.
  • the processor 301 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 3 .
  • FIG. 3 is only used as an example of a cloud-side device, and does not limit the specific structure of the cloud-side device.
  • the cloud-side device may also include other functional modules.
  • a device positioning method provided by the embodiment of the present application can determine the spatial pose of the user device by extracting consistent horizontal resolution features of images captured by the user device and performing feature matching in the panoramic feature library.
  • the panorama feature library is used to represent multiple objects through a large number of 360° panorama features, where the panorama features also have position attributes.
  • Objects such as buildings (such as shopping malls, office buildings, etc.), parks, bridges, roads, lawns, squares, street lights, road signs, squares, rivers, mountains, etc. are not limited in this embodiment of the present application.
  • the spatial pose of the user equipment is used to represent the position and posture of the user equipment.
  • the position of the user equipment may be expressed by the coordinate value of the user equipment relative to the ground
  • the attitude of the user equipment may be expressed by the angle of the user equipment relative to the ground.
  • a device positioning method provided in the embodiment of the present application may include the following three stages (stage 1-stage 3):
  • Phase 1 Construction phase of panoramic feature library.
  • the panoramic feature library includes multiple 360° panoramic features (hereinafter referred to as "panoramic features") for representing multiple objects.
  • Each object can be represented by multiple panoramic features, where the panoramic features also have a location attribute.
  • the panorama feature library can be constructed at the granularity of the city.
  • the panorama feature library includes multiple panorama features used to represent various things (ie objects) in the city, such as buildings, parks, bridges, roads, lawns , squares, street lights, road signs, squares, rivers, mountains, etc.
  • a panorama feature library may be constructed based on a real-scene 3D model.
  • the panoramic feature library can be obtained by semantically classifying the real-scene 3D model, extracting iconic objects, meshing, rendering grid by grid, and encoding panoramic features grid by grid.
  • performing semantic classification on the real-scene 3D model refers to classifying objects described by the real-scene 3D model.
  • the real-scene 3D model shown in FIG. 5A it can be understood that the real-scene 3D model is used to describe different types of objects, such as buildings, ground feasible areas, and green areas.
  • the semantic classification of the real-scene 3D model is to classify the objects according to the types of objects described by the real-scene 3D model.
  • Extracting iconic objects refers to extracting information used to describe preset types of objects in the real-scene 3D model. Preset types such as buildings, ground feasible areas, green areas, etc.
  • the spatial pose of the user device when the image is captured can be determined by matching the features in the image captured by the user device with the features in the panorama feature library.
  • representative features contribute greatly to the reliability of feature matching and matching, such as features such as buildings.
  • objects may be preset to determine one or more objects in the panoramic feature library. Among them, one or more objects in the panorama feature library are iconic. Iconic means that it has great reference value for feature matching and contributes a lot to the reliability of feature matching results.
  • Meshing is used to mesh extracted objects of one or more preset types.
  • Per-mesh rendering eg Spherical rendering of objects of one or more preset types extracted per mesh.
  • the rendered image can be obtained by rendering grid by grid.
  • the rendered image obtained by rendering grid by grid may be represented in the form of a plane image to obtain a panoramic image.
  • the panorama is used to represent the information of multiple modalities of one or more landmark objects.
  • FIG. 5B shows a schematic modal diagram of a panorama obtained by rendering the common information model (common information model, CIM) shown in FIG. 5A grid by grid and expanding the cylinder.
  • CIM common information model
  • the modalities of a panorama can include, but are not limited to, textures, instancing, and depth, among others.
  • the texture information of the panorama includes surface texture (even if the surface of the object presents uneven grooves) information and surface pattern information of one or more objects.
  • the instance information of the panorama is used to represent different objects, for example, the instance information of the panorama may represent different objects with different color tones.
  • the depth information of the panorama is used to represent the distance of the object.
  • the depth information of the panorama may represent the distance of the object by color brightness, color contrast, or color hue.
  • the panorama includes information of three modalities: texture, instance and depth.
  • the grid-by-grid panoramic feature encoding is used to extract horizontal resolution-consistent panoramic features (hereinafter referred to as "first features") of one or more preset types of objects described by the panorama grid by grid, to obtain Panoramic feature library.
  • first features horizontal resolution-consistent panoramic features
  • the horizontal resolution of the panorama feature ie the first feature
  • the same horizontal resolution means that the horizontal field of view (FOV) corresponding to the adjacent positions in the horizontal direction of the panoramic feature (that is, the first feature) is the same.
  • Stage 2 Horizontal resolution consistent feature extraction stage.
  • the horizontal resolution consistent feature extraction stage is used to extract horizontal resolution consistent features (hereinafter referred to as "second features") in the image (such as the first image) captured by the user equipment.
  • the horizontal resolution consistent feature (ie the second feature) in the image is the second resolution.
  • horizontal FOVs corresponding to adjacent positions in the horizontal direction of features with the same horizontal resolution (that is, the second feature) in the image are the same.
  • the image captured by the user equipment includes an image captured by the user equipment (as shown in FIG. 6 ), an image frame in a video captured by the user equipment, etc., which is not limited in this application.
  • the user equipment may take a picture of the real scene (such as a building) in front of the user equipment to obtain the captured image.
  • the above-mentioned second resolution is the same as the first resolution.
  • the image captured by the user equipment may be preprocessed to obtain the preprocessed image (i.e. the second image).
  • the preprocessing is used to perform one or more of the following processes on the image captured by the user equipment: initializing the spatial attitude, adjusting the image brightness, adjusting the image contrast, determining the category of the object described by the image, and cylindrical projection.
  • Stage 3 Feature matching stage.
  • the feature matching stage is used to determine the spatial pose of the user equipment by matching the horizontal resolution consistent features (ie, the second features) of the images captured by the user equipment with the features in the panoramic feature library (ie, the first features) .
  • the entire panorama feature database may be searched to determine the first feature that matches the horizontal resolution consistent feature (ie, the second feature) of the image captured by the user equipment.
  • the search may be performed within a preset range of the panorama feature library, so as to determine the first feature that matches the horizontal resolution consistent feature (ie, the second feature) of the image captured by the user equipment.
  • the preset range may include multiple priority sub-ranges.
  • the preset range includes a first range and a second range.
  • the priority of the first range is higher than that of the second range, and when performing feature matching, the sub-range with higher priority is preferentially searched.
  • the matching similarity between the second feature and multiple first features in the sliding window range by sliding the second feature in the panoramic feature library, and according to the matching similarities between the second feature and multiple sliding window ranges
  • a plurality of matching similarities of the plurality of first features determines a first feature that matches a second feature.
  • the feature that matches the second feature is the first feature corresponding to the highest matching similarity.
  • the spatial pose of the user equipment determined through feature matching may be represented by a 6-degree-of-freedom pose of the user equipment.
  • the 6-DOF pose can be represented by (x, y, z, ⁇ , ⁇ , ⁇ ).
  • (x, y, z) is used to indicate the spatial position of the user equipment, and x, y and z are respectively the X-axis coordinate value, the Y-axis coordinate value and the Z-axis coordinate value of the user equipment in the preset spatial coordinate system.
  • ⁇ , ⁇ , ⁇ are used to represent the space attitude of the user equipment
  • is the pitch angle (pitch)
  • is the yaw angle (yaw)
  • is the roll angle (roll).
  • ⁇ , ⁇ and ⁇ are the rotation values of the user equipment relative to the X-axis, Y-axis and Z-axis of the preset space coordinate system, respectively.
  • the preset spatial coordinate system may be a ground coordinate system.
  • the right-handed Cartesian coordinate system composed of the X-axis, Y-axis and Z-axis with O as the coordinate origin is the ground coordinate system.
  • the coordinate origin O can be any point in space; the X-axis points to any direction in the horizontal plane; the Z-axis is perpendicular to the plane where the X-axis is located and points to the center of the earth.
  • the Y axis is perpendicular to the X axis, and is perpendicular to the Z axis.
  • the spatial position of the user equipment may be represented by (x, y, z), that is, the coordinate value of the user equipment in the preset spatial coordinate system is (x, y, z).
  • FIG. 7 only uses the ground coordinate system as an example for the preset spatial coordinate system, and the preset spatial coordinate system may also be other spatial coordinate systems, which are not specifically limited in this embodiment of the present application.
  • the determination of the spatial pose of the user equipment can be completed by the user equipment (such as a smart phone), or by a cloud-side device (such as a server), or by a user equipment (such as a smart phone). mobile phone) and cloud-side devices (such as servers) to complete. That is, the above-mentioned tasks of phase 1 to phase 3 may be performed by the user equipment, or may be performed by the cloud-side device, or may be performed by the user equipment and the cloud-side device in a divided manner.
  • the tasks performed by the stage 1-stage 3 are performed separately.
  • the construction task of the panorama feature library (that is, the task of stage 1) can be performed by the cloud-side device (such as a server)
  • the horizontal resolution consistent feature extraction task i.e., the task of stage 2
  • feature matching task i.e., the task of stage 3
  • user devices such as smartphones.
  • a device positioning method provided in this embodiment of the application may include 8 shows stage 1-stage 3.
  • a device positioning method provided in the embodiment of the present application may specifically include S901-S911:
  • the cloud-side device acquires a real-scene three-dimensional model.
  • the real-scene 3D model may include, but not limited to, one or more of aerial real-scene 3D models, satellite real-scene 3D models, or city information models (common information model (CIM) as shown in FIG. 5A ).
  • aerial real-scene 3D models may include, but not limited to, one or more of aerial real-scene 3D models, satellite real-scene 3D models, or city information models (common information model (CIM) as shown in FIG. 5A ).
  • CIM common information model
  • real-world 3D can be obtained by creating a 3D model based on urban planning maps, urban layout measurement (such as laser scanner measurement, etc.), satellite measurement, and aerial measurement (such as aerial photography, UAV aerial survey, etc.) Model.
  • urban planning maps such as laser scanner measurement, etc.
  • satellite measurement such as satellite measurement
  • aerial measurement such as aerial photography, UAV aerial survey, etc.
  • the cloud-side device semantically classifies the real-scene three-dimensional model to obtain types of multiple objects described by the real-scene three-dimensional model.
  • the real scene 3D model is used to describe multiple objects, such as buildings (such as shopping malls, office buildings, etc.), ground feasible areas (such as squares, roads, street lights, etc.), green areas (such as trees, lawns, etc.) and other objects.
  • buildings such as shopping malls, office buildings, etc.
  • ground feasible areas such as squares, roads, street lights, etc.
  • green areas such as trees, lawns, etc.
  • the types of multiple objects described by the real-scene 3D model can be obtained, so that the panoramic features in the subsequent panoramic feature database have strong referenceability and reduce Redundancy of information in the panorama feature library (for example, redundant information with low referenceability when performing feature matching such as lawns).
  • the cloud-side device extracts one or more preset types of objects from the multiple objects described by the real-scene three-dimensional model.
  • the preset types are, for example, buildings, ground feasible areas, green areas, and the like. Preset types of objects such as buildings, mountains, squares, roads, etc.
  • iconic objects such as buildings, mountains, squares, roads, etc.
  • feature matching results can be performed relatively quickly and accurately.
  • the cloud-side device meshes the extracted one or more preset types of objects.
  • the extracted one or more preset types of objects may be gridded according to fixed intervals or dynamic intervals.
  • the fixed interval may be empirically set by an algorithm or software developer. As an example, under the condition that other conditions are equal, the smaller the fixed interval, that is, the denser the sampling, the higher the accuracy of the subsequently obtained panorama feature library. Fixed intervals such as 0.5 meters, 1 meter, etc., depending on the specific situation.
  • the dynamic interval may be set by an algorithm or software developer according to the importance of the region and/or the importance of the type of the object in the real-scene 3D model.
  • the importance of regions and the importance of types are set by algorithm or software developers according to experience, and this application does not limit the specific setting basis. For example, in individual buildings and ground feasible areas are more important than green areas. In realistic 3D models, urban areas are more important than suburban areas.
  • the intervals are relatively small; objects in relatively secondary areas or corresponding objects of relatively secondary types When meshing, the intervals are relatively large.
  • the cloud-side device renders the extracted one or more preset types of objects grid by grid to obtain a rendered image.
  • the cloud-side device may obtain the rendered image by performing spherical rendering on the extracted one or more preset types grid by grid.
  • the grid-by-grid spherical rendering may specifically include: setting a virtual sphere with a fixed radius (such as 1 meter) on each grid, and placing one or more preset types of objects (such as a single building and/or ground feasible area) projected onto the spherical surface.
  • the cloud-side device may represent the rendered image obtained by grid-by-grid rendering in the form of a plane image to obtain a panoramic image (as shown in FIG. 5B ).
  • the modality of the panorama may include but not limited to texture, instance, and depth.
  • the cloud-side device may expand the cylinder of the rendered image obtained through grid-by-grid rendering to obtain a panorama.
  • the horizontal and vertical resolutions of the planar image after the cylindrical expansion are the same, for example, both are 0.1 degree, etc., depending on the actual situation. That is, one vertically or horizontally adjacent pixel of the planar image after cylinder expansion corresponds to a pixel at a vertically or horizontally adjacent fixed angle (eg, 0.1 degree) position of the virtual sphere.
  • the cloud-side device encodes the extracted panoramic feature of one or more preset types of objects grid by grid to obtain a panoramic feature library.
  • the cloud-side device may perform panorama feature encoding on one or more preset types of objects in the panorama grid by grid to obtain panorama features corresponding to the grid.
  • the panoramic feature ie, the first feature
  • the panoramic feature is a feature with consistent horizontal resolution, that is, the horizontal FOVs corresponding to adjacent positions in the horizontal direction of the panoramic feature (ie, the first feature) are the same.
  • the width and height of the panorama feature are w i and hi respectively.
  • an artificial intelligence (AI) model such as a modality encoding network
  • a modality encoding network may be used to extract the first feature.
  • the embodiment of the present application does not limit the specific topology of the modality coding network.
  • modality encoding networks may include, but are not limited to, deep convolutional neural networks, deep residual networks, recurrent neural networks, and the like.
  • modal coding networks such as deep convolutional neural networks, deep residual networks, and recurrent neural networks, you can refer to conventional technologies, and the embodiments of this application will not repeat them.
  • the user equipment preprocesses the captured image.
  • the second image is obtained.
  • Preprocessing may include, but not limited to, one or more of the following: initializing the pitch and roll angles (such as correcting to 0), adjusting the brightness to a preset brightness, adjusting the contrast to a preset Set the contrast, semantically classify the objects described in the image, and cylindrical projection.
  • initializing the pitch angle (pitch) and the roll angle (roll) is used to initialize the space attitude.
  • Semantic classification is used to determine the class of objects described by an image.
  • Cylindrical projections are used to achieve visual consistency by projecting a planar image onto the curved surface of a cylinder. After cylindrical projection, the image can meet the 360-degree viewing in the horizontal direction, which has a better visual effect.
  • FIG. 10 shows an example of an image preprocessing effect.
  • the second image shown in FIG. 10 is obtained.
  • the user equipment extracts a horizontal resolution consistent feature (hereinafter referred to as "second feature") of the preprocessed image (ie, the second image).
  • second feature a horizontal resolution consistent feature
  • the user equipment may perform panorama feature encoding on one or more objects described in the preprocessed image (that is, the second image) to obtain the same horizontal resolution in the second image (such as the second resolution ) feature (that is, the second feature).
  • the second resolution is the same as the first resolution.
  • horizontal FOVs corresponding to horizontally adjacent positions of the second feature in the second image are the same.
  • the width and height of the second feature are w j and h j respectively (w j is less than or equal to w i ).
  • an AI model such as a modality encoding network
  • a modality encoding network can be used to extract horizontal resolution consistent (features (ie, second features) in the second image.
  • the modality encoding network can include but is not limited to a deep convolutional neural network network, deep residual network, recurrent neural network, etc.
  • the user equipment obtains the panorama feature database from the cloud side equipment.
  • the user equipment determines a panorama feature matching the second feature by sliding the second feature in the panorama feature library.
  • the user equipment may slide the second feature in the panorama feature library, and calculate the matching similarity between the second feature and multiple first features within multiple sliding window ranges, so as to determine the first feature that best matches the second feature. a feature.
  • the user equipment may perform a sliding window within the entire range of the panoramic feature database, so as to calculate matching similarities between the second feature and multiple first features within multiple sliding window ranges within the entire library.
  • the user equipment may slide a window within a preset range of the panorama feature library, so as to calculate matching similarities between the second feature and multiple first features within multiple sliding window ranges within the preset range.
  • the preset range may be determined by the user equipment according to the collected location information in combination with a setting rule of the preset range.
  • the location information may be collected by the user equipment through but not limited to one or more of the following positioning systems: GPS, GLONASS, BDS, QZSS, or SBAS, etc., which are not limited in this application.
  • the location of the user equipment when collecting the location information is the same as the location of the user equipment when capturing the first image.
  • the user equipment may collect location information while capturing the first image.
  • the preset range may be centered on the location where the user equipment captures the first image (point O0 as shown in FIG. 11 ) in the panorama feature library, and r (r is a positive number) is A circular area of radius, as shown in Figure 11.
  • the position information of the user equipment collected by positioning systems is usually represented by longitude and latitude information (such as (lon, lat)), while the panorama feature library usually represents the position with coordinate values,
  • the position of the user equipment when capturing the first image can be converted from longitude and latitude information into coordinate values, such as converting (lon, lat) into (X 0 , Y 0 ), where (X 0 , Y 0 ) is In the panorama feature database, the position of the user equipment when capturing the first image.
  • the coordinate value of point O 0 is (X 0 , Y 0 ).
  • the preset range may include multiple priority sub-ranges.
  • the preset range includes a first range and a second range.
  • the priority of the first range is higher than that of the second range, and when performing the sliding window search, the priority is to search within the sub-range with higher priority.
  • the preset range includes a first range and a second range.
  • the first range is a circular area with r1 (r1 is a positive number) as the center and r1 (r1 is a positive number) as the center at the position where the user equipment captures the first image (point O0 as shown in FIG. 12 ) in the panorama feature library .
  • the second range is centered on the location where the user equipment captures the first image (point O 0 as shown in FIG. Number) is the ring area of the outer diameter, where r1 is smaller than r2.
  • the sliding window step size is s (s is a positive number, and s is less than or equal to w i )
  • N first features in the preset range there are N first features in the preset range (where N is a positive integer)
  • the width of the first feature is w i
  • the second feature and one first feature complete a sliding window and can be calculated as A similarity score
  • the second feature can be calculated by sliding the window within the preset range a similarity score. in, Refers to rounding the result of w i /s.
  • the user equipment can A similarity score determines the panoptic features that match the second feature. For example, a user device can determine Among the similarity scores, the first feature corresponding to the largest similarity score (such as S max ) matches the second feature. Another example, if Among the similarity scores, if the largest similarity score (such as S max ) is greater than or equal to a preset threshold (such as ⁇ ), then the user equipment determines that the first feature corresponding to the largest similarity score (that is, S max ) corresponds to the second feature. feature match. like Among the four similarity scores, if the largest similarity score (such as S max ) is smaller than a preset threshold (such as ⁇ ), the positioning of the user equipment fails.
  • a preset threshold such as ⁇
  • the user equipment can slide windows in the multiple sub-ranges of the panorama feature library in the order of high priority ⁇ low priority, and calculate the second feature and multiple sub-ranges in the multiple sub-ranges. The matching similarity of each first feature is determined until the first feature matching the second feature is determined.
  • the user equipment determines the search range, determines the sliding window within the first range, and calculates the sliding window of the second feature within the first range a similarity score. Further, as an example, as shown in Figure 13, the user equipment performs Sort by similarity score.
  • the user equipment determines that the first feature corresponding to the largest similarity score (that is, S1 max ) corresponds to the second feature. feature match.
  • a preset threshold such as ⁇
  • the user equipment expands the retrieval range. As shown in FIG. 13 , the user equipment determines to slide the window within the second range.
  • the user equipment slides the window in the second range of the panoramic feature library, and calculates get a similarity score. Further, as shown in Figure 13, the user equipment sorted by similarity score. like Among the four similarity scores, if the largest similarity score (such as S2 max ) is greater than or equal to a preset threshold (such as ⁇ ), then the user equipment determines that the first feature and the second feature corresponding to the largest similarity score (that is, S2 max ) feature match. like Among the four similarity scores, if the largest similarity score (such as S2 max ) is still smaller than a preset threshold (such as ⁇ ), and there is no sub-range with lower priority, the user equipment location fails.
  • a preset threshold such as ⁇
  • the user equipment determines the spatial pose of the user equipment according to the panorama feature matched with the second feature.
  • the spatial pose of the user equipment may be represented by a 6-degree-of-freedom pose of the user equipment.
  • a 6-DOF pose can be represented by (x, y, z, ⁇ , ⁇ , ⁇ ).
  • (x, y, z) is used to represent the spatial position of the user equipment.
  • ( ⁇ , ⁇ , ⁇ ) are used to represent the space attitude of the user equipment, ⁇ is the pitch angle (pitch), ⁇ is the yaw angle (yaw), and ⁇ is the roll angle (roll).
  • the user equipment can obtain (x, y, z), pitch angle (pitch) ⁇ , yaw angle (yaw) ⁇ and roll angle (roll) ⁇ .
  • the position corresponding to the panorama feature matched with the second feature in the real scene three-dimensional model is the spatial position (x, y, z) of the user equipment.
  • the pitch angle, yaw angle and roll angle corresponding to the panorama feature matched with the second feature are the pitch angle (pitch) ⁇ , yaw angle (yaw) ⁇ and roll angle (roll) ⁇ of the user equipment.
  • the UE outputs the 6-DOF pose (x, y, z, ⁇ , ⁇ , ⁇ ) of the UE.
  • the user equipment may obtain (x, y, z) and a yaw angle (yaw) ⁇ after determining the panorama feature that matches the second feature.
  • the position corresponding to the panorama feature matched with the second feature in the real scene three-dimensional model is the spatial position (x, y, z) of the user equipment.
  • the yaw angle corresponding to the panorama feature matched with the second feature is the yaw angle (yaw) ⁇ of the user equipment.
  • the user device fine-tunes the pitch angle and roll angle collected by the sensor when the image is captured, and compares the similarity score between the fine-tuned image horizontal resolution consistent feature and the panorama feature to determine the pitch angle (pitch) ⁇ and Roll angle (roll) ⁇ . For example, if the similarity score after fine-tuning is greater than the similarity score before fine-tuning, the pitch angle and roll angle after fine-tuning are retained. Then, the UE outputs the 6-DOF pose (x, y, z, ⁇ , ⁇ , ⁇ ) of the UE.
  • the user equipment can The spatial pose of the device, superimposing virtual objects (such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.) in the real scene with accurate positions and orientations, bringing users a better AR experience .
  • virtual objects such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.
  • the device positioning method provided by the embodiment of the present application does not rely on manual design of feature points, does not need to pre-set sliced image retrieval features and additional PnP processing, and can be based on the panorama feature library construction technology of the real-world 3D model and sliding
  • the window retrieval technology realizes low-cost, wide-coverage, and high-precision user equipment positioning.
  • this method has a small amount of data on the cloud and high-efficiency computing on the end-side, and can provide users with lightweight AR application services.
  • the device positioning method provided by the embodiment of the present application can be applied to a scene where skyline features are not obvious.
  • the left side is a photo captured by the user equipment.
  • the skyline above the building is blocked by heavy fog. If the conventional skyline-based feature matching method is used to determine the spatial pose of the device, it cannot be accurately positioned.
  • the virtual character is rendered to the corresponding position in the real scene, as shown in the right figure in FIG. 14 .
  • the virtual character is rendered to the corresponding position in the real scene, as shown in the right figure in FIG. 15 .
  • serial numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be implemented in this application.
  • the implementation of the examples constitutes no limitation.
  • the electronic device includes a corresponding hardware structure for performing each function and/or or software modules.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • the functional modules of the electronic device can be divided.
  • each functional module can be divided corresponding to each function, or the two One or more functions are integrated in one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
  • the user equipment may include an image capture unit 1610 , a processing unit 1620 , a position detection unit 1630 , a display unit 1640 , a storage unit 1650 and a transceiver unit 1660 .
  • the image capture unit 1610 is used to support the user equipment to capture images (such as the first image), for example, the image capture unit 1610 includes one or more cameras.
  • the processing unit 1620 is used to support the user equipment to preprocess the images captured by the image capture unit 1610, determine fixed intervals, dynamic intervals, or preset ranges, etc., and extract horizontal resolution consistent features of images captured by the user equipment through the image capture unit 1610 (such as the second feature), search in the panorama feature library to determine the panorama feature (such as the first feature) that matches the horizontal resolution consistent feature (such as the second feature) of the image, according to the horizontal resolution consistent feature of the image (such as the second feature)
  • the second feature matches the panorama feature (such as the first feature), determines the spatial pose of the user equipment when the image is captured by the image capture unit 1610, and/or other processes related to the embodiment of the present application.
  • the location detection unit 1630 is configured to support the user equipment to obtain the location information of the user equipment when the image capture unit 1610 captures an image, and/or other processes related to the embodiment of the present application.
  • the display unit 1640 is used to support the user equipment to display the image captured by the image capture unit 1610, and display the AR image superimposed with virtual objects (such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.) in the real scene, And/or other interfaces related to the embodiment of this application.
  • virtual objects such as virtual navigation arrows, virtual road signs, virtual billboards, virtual information, virtual things, etc.
  • the storage unit 1650 is used to support the user equipment to store computer programs and implement processing data and/or processing results in the methods provided in the embodiments of the present application.
  • the transceiver unit 1660 is used to transmit and receive radio signals.
  • the transceiving unit 1660 is configured to support the user equipment to obtain the panorama feature library from the first device (such as the cloud-side device), and/or other processes related to the embodiment of the present application.
  • the transceiver unit 1660 may include a radio frequency circuit.
  • the user equipment can receive and send wireless signals through a radio frequency circuit.
  • radio frequency circuitry includes, but is not limited to, an antenna, at least one amplifier, transceiver, coupler, low noise amplifier, duplexer, and the like.
  • radio frequency circuits can also communicate with other devices through wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile Communications, General Packet Radio Service, Code Division Multiple Access, Wideband Code Division Multiple Access, Long Term Evolution, Email, Short Message Service, etc.
  • each module in the electronic device may be implemented in the form of software and/or hardware, which is not specifically limited. In other words, electronic equipment is presented in the form of functional modules.
  • the "module” here may refer to an application-specific integrated circuit ASIC, a circuit, a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available media integrated.
  • the available medium can be a magnetic medium, (such as a floppy disk, a hard disk, etc. , tape), optical media (such as digital video disk (digital video disk, DVD)), or semiconductor media (such as solid state disk (SSD)), etc.
  • the steps of the methods or algorithms described in conjunction with the embodiments of this application can be implemented in the form of hardware, or can be implemented in the form of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage known in the art medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC may be located in the electronic device.
  • the processor and the storage medium can also exist in the electronic device as discrete components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种设备定位方法、设备及系统,涉及电子技术领域,可以基于低成本、广覆盖、高精度的视觉定位技术实现对用户设备空间位姿的准确定位。本申请中通过将用于表示多个对象的全景特征的全景特征库作为检索库,在对用户设备进行定位时,可以通过提取用户设备捕获的图像的水平分辨率一致特征,在全景特征库中检索以确定与图像的水平分辨率一致特征匹配的全景特征。然后基于该全景特征,确定用户设备捕获上述图像时的空间位姿。通过该方案进行设备定位,成本低、覆盖广(例如该方法不依赖于人工设计特征点,不需要预先设置切片式图像检索特征,可以应用于天际线特征不明显的场景等)、精度高。

Description

一种设备定位方法、设备及系统
本申请要求于2021年9月30日提交国家知识产权局、申请号为202111166626.6、申请名称为“一种设备定位方法、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像信息处理技术领域,尤其涉及一种设备定位方法、设备及系统。
背景技术
目前,随着移动智能设备的普遍性应用,大量的基于虚拟现实(virtual reality,VR)和增强现实(augmented reality,AR)的应用也不断地涌入市场,例如:AR应用场景中,将虚拟物体叠加到真实场景的图片中,进而可以应用到游戏、医疗、教育、导航等各种领域。其中,在AR应用场景中,为了使用户有身临其境的感觉,确定用户设备的空间位姿成为重点。用户设备的空间位姿用于表示用户设备的位置和姿态。
作为一种可能的实现方式,可以基于全球导航卫星系统(global navigation satellite system,GNSS)确定用户设备的位置,以及基于用户设备的运动传感器(如陀螺仪传感器、加速度传感器和重力传感器等)采集的用户设备的运动数据确定用户设备的姿态。但是,上述方法只能实现对用户设备的位置和姿态的粗略估计,无法实现高精度的虚拟物体在真实场景中的叠加。
发明内容
本申请提供一种设备定位方法、设备及系统,可以基于低成本、广覆盖、高精度的视觉定位技术实现对用户设备空间位姿的准确定位。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,提供一种设备定位方法,该方法应用于设备定位系统,该方法包括:首先,获取包括用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库。以及,提取用户设备捕获的图像的第二特征。其中,该多个第一特征的水平分辨率一致,第二特征的水平分辨率一致。然后,在全景特征库中检索,以确定与第二特征匹配的第一特征;最后,根据与第二特征匹配的第一特征,确定用户设备捕获上述图像时的空间位姿,其中空间位姿用于表示用户设备位置和姿态。
示例性的,上述全景特征库中包括用于表示城市中各种事物(即对象)的多个全景特征,如建筑、公园、桥梁、公路、草坪、广场、路灯、路标、广场、河流、山等。
上述第一方面提供的方案,设备定位系统通过将用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,在确定用户设备采集图像时的空间位姿时,可以通过提取用户设备捕获的图像的第二特征(其中第二特征是水平分辨率一致特征),在全景特征库中检索以确定与第二特征匹配的第一特征。然后基于与第二特征匹配的第一特征,确定用户设备捕获上述图像时的空间位姿。该方案 实现低成本、广覆盖(例如不依赖于人工设计特征点,不需要预先设置切片式图像检索特征以及额外的精确导航处理器(precision navigation processor,PnP)处理,可以应用于天际线特征不明显的场景等)、高精度的用户设备定位。
在一种可能的实现方式中,上述获取全景特征库,具体可以包括:基于用于描述多个对象的空间信息的实景三维模型,构建得到全景特征库。作为一种示例,设备定位系统可以通过基于实景三维模型构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,该方法成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述基于实景三维模型,构建得到上述全景特征库,包括:通过对实景三维模型语义化分类,预设类型对象提取、渲染和全景特征编码,得到全景特征库。作为一种示例,设备定位系统可以通过对实景三维模型语义化分类得到实景三维模型所描述的对象分类;通过预设类型对象提取得到实景三维模型中,用于描述预设类型的对象的信息;通过渲染和全景特征编码得到包括一个或多个预设类型的对象的全景特征库。通过该方法可以构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述基于实景三维模型,构建得到全景特征库,包括:对实景三维模型进行语义化分类,得到实景三维模型所描述的多个对象的类型;从实景三维模型所描述的多个对象中提取一个或多个预设类型的对象;将提取的一个或多个预设类型的对象网格化;逐网格渲染一个或多个预设类型的对象,得到渲染图;将渲染图柱面展开,得到全景图;逐网格对全景图中的一个或多个预设类型的对象全景特征编码,得到全景特征库。作为一种示例,设备定位系统可以通过对实景三维模型语义化分类得到实景三维模型所描述的多个对象的类型;通过预设类型对象提取得到实景三维模型中,用于描述预设类型的对象的信息;通过将提取的一个或多个预设类型的对象网格化以提高后续渲染和全景特征编码的精度;通过渲染、柱面展开和全景特征编码得到包括一个或多个预设类型的对象的全景特征库。通过该方法可以构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述将提取的一个或多个预设类型的对象网格化,包括:按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化。作为一种示例,设备定位系统可以按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化,以根据实际需要进行采样密度调整,以得到满足实际需求的全景特征库精度。例如固定间隔和动态间隔可以由算法或软件开发人员经验设置。
在一种可能的实现方式中,上述方法还包括:按照实景三维模型中区域的重要性,和/或一个或多个预设类型的对象所属类型的重要程度设置动态间隔。基于此,可以根据实际需要进行采样密度调整,以得到满足实际需求的全景特征库精度。
在一种可能的实现方式中,上述用户设备捕获的图像是第一图像,在提取用户设备捕获的图像的第二特征之前,上述方法还包括:对第一图像进行预处理,得到第二图像;其中,预处理包括以下中的一种或多种:初始化空间姿态、将图像亮度调整为 预设亮度、将图像对比度调整为预设对比度、对图像所描述的对象语义化分类、柱面投影。基于此,可以进一步提高后续第一特征与第二特征匹配的准确性。
在一种可能的实现方式中,上述提取用户设备捕获的图像的第二特征,包括:对第二图像所描述的一个或多个对象进行全景特征编码,得到第二图像的第二特征。其中,全景特征编码用于提取图像中对象的水平分辨率一致特征。
在一种可能的实现方式中,上述在全景特征库中检索,以确定与第二特征匹配的第一特征,包括:将第二特征在全景特征库中滑窗,确定第二特征与滑窗范围内多个第一特征的匹配相似度;根据滑窗范围中第二特征与多个第一特征的多个匹配相似度,确定与第二特征匹配的第一特征。为了快速、精确地进行特征匹配,可以通过全景滑窗技术通过计算得到的第二特征与多个第一特征的匹配相似度,确定与第二特征匹配的第一特征。
在一种可能的实现方式中,与第二特征匹配的第一特征是多个匹配相似度中,最高的匹配相似度对应的第一特征。作为一种示例,可以根据得到的第二特征与多个第一特征的匹配相似度的大小,确定最高匹配相似度对应的第一特征是与第二特征匹配的第一特征。
在一种可能的实现方式中,上述在全景特征库中检索,以确定与第二特征匹配的第一特征,包括:在全景特征库全库范围内检索,以确定与第二特征匹配的第一特征;或者,在全景特征库的预设范围内检索,以确定与第二特征匹配的第一特征。作为一种示例,全景滑窗技术可以基于全景特征库全库范围或者全景特征库的预设范围实现,可根据实际需要适应性调整,该方法方便、快速、且可最大限度的节约计算力。
在一种可能的实现方式中,上述方法还包括:根据用户设备捕获图像时,用户设备所处的位置,结合预设范围的设置规则确定预设范围。通过结合用户设备捕获图像时,用户设备所处的位置进行预设范围设置,进而滑窗以进行特征匹配,可以方便、快速地实现设备定位,且可最大限度的节约计算力。
在一种可能的实现方式中,上述预设范围是以用户设备捕获图像时,用户设备所处的位置为中心,以r为半径的圆形区域,其中r为正数。通过结合用户设备捕获图像时,用户设备所处的位置进行预设范围设置,进而滑窗以进行特征匹配,可以方便、快速地实现设备定位,且可最大限度的节约计算力。
在一种可能的实现方式中,上述预设范围包括第一范围和第二范围,第一范围的优先级高于第二范围;则在全景特征库的预设范围内检索,以确定与第二特征匹配的第一特征,包括:在第一范围内检索;若在第一范围内未检索到与第二特征匹配的第一特征,则在第二范围内检索,以确定与第二特征匹配的第一特征。通过阶梯化的预设范围设置,可以在方便、快速实现设备定位的同时,最大限度的节约计算力。
在一种可能的实现方式中,上述第一范围是以用户设备捕获图像时,用户设备所处的位置为中心,以r1为半径的圆形区域,其中r1为正数;第一范围是以用户设备捕获图像时,用户设备所处的位置为中心,以r1为内径,r2为外径的圆环区域,其中r1和r2为正数,且r1小于r2。通过阶梯化的预设范围设置,可以在方便、快速实现设备定位的同时,最大限度的节约计算力。
在一种可能的实现方式中,上述第二特征的水平分辨率与第一特征的水平分辨率相同。通过一致的水平分辨率设置,可以在设备定位过程中的特征匹配阶段,得到更加好的匹配效果。
在一种可能的实现方式中,上述实景三维模型包括但不限定于以下中的一种或多种:航空实景三维模型、卫星实景三维模型、城市信息模型。
第二方面,提供一种设备定位方法,该方法应用于第一设备(如云侧设备),该方法包括:第一设备基于实景三维模型,构建得到全景特征库。其中,实景三维模型用于描述多个对象的空间信息;全景特征库中包括用于表示多个对象的多个第一特征,多个第一特征的水平分辨率一致。
示例性的,上述全景特征库中包括用于表示城市中各种事物(即对象)的多个全景特征,如建筑、公园、桥梁、公路、草坪、广场、路灯、路标、广场、河流、山等。
上述第二方面提供的方案,第一设备通过将用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,在确定用户设备采集图像时的空间位姿时,可以提供低成本、广覆盖(例如不依赖于人工设计特征点,不需要预先设置切片式图像检索特征以及额外的精确导航处理器(precision navigation processor,PnP)处理,可以应用于天际线特征不明显的场景等)、高精度的用户设备定位。
在一种可能的实现方式中,上述第一设备基于实景三维模型,构建得到上述全景特征库,包括:第一设备通过对实景三维模型语义化分类,预设类型对象提取、渲染和全景特征编码,得到全景特征库。作为一种示例,第一设备可以通过对实景三维模型语义化分类得到实景三维模型所描述的对象分类;通过预设类型对象提取得到实景三维模型中,用于描述预设类型的对象的信息;通过渲染和全景特征编码得到包括一个或多个预设类型的对象的全景特征库。通过该方法可以构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述第一设备基于实景三维模型,构建得到全景特征库,包括:第一设备对实景三维模型进行语义化分类,得到实景三维模型所描述的多个对象的类型;第一设备从实景三维模型所描述的多个对象中提取一个或多个预设类型的对象;第一设备将提取的一个或多个预设类型的对象网格化;第一设备逐网格渲染一个或多个预设类型的对象,得到渲染图;第一设备将渲染图柱面展开,得到全景图;第一设备逐网格对全景图中的一个或多个预设类型的对象全景特征编码,得到全景特征库。作为一种示例,第一设备可以通过对实景三维模型语义化分类得到实景三维模型所描述的多个对象的类型;通过预设类型对象提取得到实景三维模型中,用于描述预设类型的对象的信息;通过将提取的一个或多个预设类型的对象网格化以提高后续渲染和全景特征编码的精度;通过渲染、柱面展开和全景特征编码得到包括一个或多个预设类型的对象的全景特征库。通过该方法可以构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述第一设备将提取的一个或多个预设类型的对象网格化,包括:第一设备按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化。作为一种示例,第一设备可以按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化,以根据实际需要进行采样密度调整,以得到满足实际需求的全景特征库精度。例如固定间隔和动态间隔可以由算法或软件开发人员经验设置。
在一种可能的实现方式中,上述方法还包括:第一设备按照实景三维模型中区域的重要性,和/或一个或多个预设类型的对象所属类型的重要程度设置动态间隔。基于此,可以根据实际需要进行采样密度调整,以得到满足实际需求的全景特征库精度。
在一种可能的实现方式中,上述实景三维模型包括但不限定于以下中的一种或多种:航空实景三维模型、卫星实景三维模型、城市信息模型。
第三方面,提供一种设备定位方法,该方法应用于第二设备(如用户设备),该方法包括:第二设备提取用户设备捕获的图像的第二特征。其中,该多个第一特征的水平分辨率一致,第二特征的水平分辨率一致。然后,在全景特征库中检索,以确定与第二特征匹配的第一特征;最后,根据与第二特征匹配的第一特征,确定用户设备捕获上述图像时的空间位姿,其中空间位姿用于表示第二设备(如用户设备)的位置和姿态。
上述第三方面提供的方案,在确定用户设备采集图像时的空间位姿时,第二设备可以通过提取用户设备捕获的图像的第二特征(其中第二特征是水平分辨率一致特征),在全景特征库中检索以确定与第二特征匹配的第一特征。然后基于与第二特征匹配的第一特征,确定用户设备捕获上述图像时的空间位姿。该方案实现低成本、广覆盖(例如不依赖于人工设计特征点,不需要预先设置切片式图像检索特征以及额外的PnP处理,可以应用于天际线特征不明显的场景等)、高精度的用户设备定位。
在一种可能的实现方式中,上述用户设备捕获的图像是第一图像,在第二设备提取用户设备捕获的图像的第二特征之前,上述方法还包括:第二设备对第一图像进行预处理,得到第二图像;其中,预处理包括以下中的一种或多种:初始化空间姿态、将图像亮度调整为预设亮度、将图像对比度调整为预设对比度、对图像所描述的对象语义化分类、柱面投影。基于此,可以进一步提高后续第一特征与第二特征匹配的准确性。
在一种可能的实现方式中,上述第二设备提取用户设备捕获的图像的第二特征,包括:第二设备对第二图像所描述的一个或多个对象进行全景特征编码,得到第二图像的第二特征。其中,全景特征编码用于提取图像中对象的水平分辨率一致特征。
在一种可能的实现方式中,上述第二设备在全景特征库中检索,以确定与第二特征匹配的第一特征,包括:第二设备将第二特征在全景特征库中滑窗,确定第二特征与滑窗范围内多个第一特征的匹配相似度;第二设备根据滑窗范围中第二特征与个多个第一特征的多个匹配相似度,确定与第二特征匹配的第一特征。为了快速、精确地进行特征匹配,可以通过全景滑窗技术通过计算得到的第二特征与多个第一特征的匹配相似度,确定与第二特征匹配的第一特征。
在一种可能的实现方式中,与第二特征匹配的第一特征是多个匹配相似度中,最高的匹配相似度对应的第一特征。作为一种示例,可以根据得到的第二特征与多个第一特征的匹配相似度的大小,确定最高匹配相似度对应的第一特征是与第二特征匹配的第一特征。
在一种可能的实现方式中,上述第二设备在全景特征库中检索,以确定与第二特征匹配的第一特征,包括:第二设备在全景特征库全库范围内检索,以确定与第二特征匹配的第一特征;或者,第二设备在全景特征库的预设范围内检索,以确定与第二特征匹配的第一特征。作为一种示例,全景滑窗技术可以基于全景特征库全库范围或者全景特征库的预设范围实现,可根据实际需要适应性调整,该方法方便、快速、且可最大限度的节约计算力。
在一种可能的实现方式中,上述方法还包括:第二设备根据用户设备捕获图像时,用户设备所处的位置,结合预设范围的设置规则确定预设范围。通过结合用户设备捕获图像时,用户设备所处的位置进行预设范围设置,进而滑窗以进行特征匹配,可以方便、快速地实现设备定位,且可最大限度的节约计算力。
在一种可能的实现方式中,上述预设范围是以用户设备捕获图像时,用户设备所处的位置为中心,以r为半径的圆形区域,其中r为正数。通过结合用户设备捕获图像时,用户设备所处的位置进行预设范围设置,进而滑窗以进行特征匹配,可以方便、快速地实现设备定位,且可最大限度的节约计算力。
在一种可能的实现方式中,上述预设范围包括第一范围和第二范围,第一范围的优先级高于第二范围;则第二设备在全景特征库的预设范围内检索,以确定与第二特征匹配的第一特征,包括:第二设备在第一范围内检索;若在第一范围内未检索到与第二特征匹配的第一特征,则在第二范围内检索,以确定与第二特征匹配的第一特征。通过阶梯化的预设范围设置,可以在方便、快速实现设备定位的同时,最大限度的节约计算力。
在一种可能的实现方式中,上述第一范围是以用户设备捕获图像时,用户设备所处的位置为中心,以r1为半径的圆形区域,其中r1为正数;第一范围是以用户设备捕获图像时,用户设备所处的位置为中心,以r1为内径,r2为外径的圆环区域,其中r1和r2为正数,且r1小于r2。通过阶梯化的预设范围设置,可以在方便、快速实现设备定位的同时,最大限度的节约计算力。
在一种可能的实现方式中,上述第二特征的水平分辨率与第一特征的水平分辨率相同。通过一致的水平分辨率设置,可以在设备定位过程中的特征匹配阶段,得到更加好的匹配效果。
第四方面,提供一种第一设备(如云侧设备),该第一设备包括:处理单元,用于基于实景三维模型,构建得到全景特征库。其中,实景三维模型用于描述多个对象的空间信息;全景特征库中包括用于表示多个对象的多个第一特征,多个第一特征的水平分辨率一致。
上述第四方面提供的方案,第一设备通过将用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,在确定用户设备采集图像时的空间位姿时,可以提供低成本、广覆盖(例如不依赖于人工设计特征点,不需要预 先设置切片式图像检索特征以及额外的精确导航处理器(precision navigation processor,PnP)处理,可以应用于天际线特征不明显的场景等)、高精度的用户设备定位。
在一种可能的实现方式中,上述处理单元具体用于:通过对实景三维模型语义化分类,预设类型对象提取、渲染和全景特征编码,得到全景特征库。作为一种示例,第一设备可以通过对实景三维模型语义化分类得到实景三维模型所描述的对象分类;通过预设类型对象提取得到实景三维模型中,用于描述预设类型的对象的信息;通过渲染和全景特征编码得到包括一个或多个预设类型的对象的全景特征库。通过该方法可以构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述处理单元具体用于:对实景三维模型进行语义化分类,得到实景三维模型所描述的多个对象的类型;从实景三维模型所描述的多个对象中提取一个或多个预设类型的对象;将提取的一个或多个预设类型的对象网格化;逐网格渲染一个或多个预设类型的对象,得到渲染图;以及,将渲染图柱面展开,得到全景图;第一设备逐网格对全景图中的一个或多个预设类型的对象全景特征编码,得到全景特征库。作为一种示例,第一设备可以通过对实景三维模型语义化分类得到实景三维模型所描述的多个对象的类型;通过预设类型对象提取得到实景三维模型中,用于描述预设类型的对象的信息;通过将提取的一个或多个预设类型的对象网格化以提高后续渲染和全景特征编码的精度;通过渲染、柱面展开和全景特征编码得到包括一个或多个预设类型的对象的全景特征库。通过该方法可以构建用于表示多个对象的多个第一特征(其中第一特征是全景特征)的全景特征库作为检索库,成本低、易实现、可以为设备定位提供更加精确的参考。
在一种可能的实现方式中,上述处理单元具体用于:按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化。作为一种示例,第一设备可以按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化,以根据实际需要进行采样密度调整,以得到满足实际需求的全景特征库精度。例如固定间隔和动态间隔可以由算法或软件开发人员经验设置。
在一种可能的实现方式中,上述处理单元还用于:按照实景三维模型中区域的重要性,和/或一个或多个预设类型的对象所属类型的重要程度设置动态间隔。基于此,可以根据实际需要进行采样密度调整,以得到满足实际需求的全景特征库精度。
在一种可能的实现方式中,上述实景三维模型包括但不限定于以下中的一种或多种:航空实景三维模型、卫星实景三维模型、城市信息模型。
第五方面,提供一种第二设备(如用户设备),该第二设备包括:处理单元,用于提取第二设备捕获的图像的第二特征;在全景特征库中检索,以确定与第二特征匹配的第一特征;以及,根据与第二特征匹配的第一特征,确定用户设备捕获上述图像时的空间位姿。其中,上述多个第一特征的水平分辨率一致,第二特征的水平分辨率一致,空间位姿用于表示第二设备的位置和姿态。
上述第五方面提供的方案,在确定用户设备采集图像时的空间位姿时,第二设备可以通过提取用户设备捕获的图像的第二特征(其中第二特征是水平分辨率一致特 征),在全景特征库中检索以确定与第二特征匹配的第一特征。然后基于与第二特征匹配的第一特征,确定用户设备捕获上述图像时的空间位姿。该方案实现低成本、广覆盖(例如不依赖于人工设计特征点,不需要预先设置切片式图像检索特征以及额外的PnP处理,可以应用于天际线特征不明显的场景等)、高精度的用户设备定位。
在一种可能的实现方式中,上述第二设备还包括:图像摄取单元,用于捕获第一图像;上述处理单元还用于:对第一图像进行预处理,得到第二图像;其中,预处理包括以下中的一种或多种:初始化空间姿态、将图像亮度调整为预设亮度、将图像对比度调整为预设对比度、对图像所描述的对象语义化分类、柱面投影。基于此,可以进一步提高后续第一特征与第二特征匹配的准确性。
在一种可能的实现方式中,上述处理单元具体用于:对第二图像所描述的一个或多个对象进行全景特征编码,得到第二图像的第二特征。其中,全景特征编码用于提取图像中对象的水平分辨率一致特征。
在一种可能的实现方式中,上述处理单元具体用于:将第二特征在全景特征库中滑窗,计算第二特征与滑窗范围内多个第一特征的匹配相似度;以及,根据第二特征与多个滑窗范围中多个第一特征的多个匹配相似度,确定与第二特征匹配的第一特征。为了快速、精确地进行特征匹配,可以通过全景滑窗技术通过计算得到的第二特征与多个第一特征的匹配相似度,确定与第二特征匹配的第一特征。
在一种可能的实现方式中,上处理单元具体用于:在全景特征库全库范围内检索,以确定与第二特征匹配的第一特征;或者,在全景特征库的预设范围内检索,以确定与第二特征匹配的第一特征。作为一种示例,全景滑窗技术可以基于全景特征库全库范围或者全景特征库的预设范围实现,可根据实际需要适应性调整,该方法方便、快速、且可最大限度的节约计算力。
在一种可能的实现方式中,上述第二设备还包括:位置检测单元,用于在图像捕获单元捕获图像时,获取用户设备的位置信息。
在一种可能的实现方式中,上述处理单元还用于:根据图像捕获单元捕获图像时,用户设备所处的位置,结合预设范围的设置规则确定预设范围。通过结合用户设备捕获图像时,用户设备所处的位置进行预设范围设置,进而滑窗以进行特征匹配,可以方便、快速地实现设备定位,且可最大限度的节约计算力。
第六方面,提供一种第一设备,该第一设备包括:存储器,用于存储计算机程序;收发器,用于接收或发送无线电信号;处理器,用于执行所述计算机程序,使得第一设备实现如第二方面任一种可能的实现方式中的方法。
第七方面,提供一种第二设备,该第二设备包括:存储器,用于存储计算机程序;收发器,用于接收或发送无线电信号;处理器,用于执行所述计算机程序,使得第一设备实现如第三方面任一种可能的实现方式中的方法。
第八方面,提供一种设备定位系统,该系统包括如第四方面或第六方面任一种可能的实现方式中的第一设备;以及如第五方面或第七方面任一种可能的实现方式中的第二设备。该设备定位系统用于实现如第一方面任一种可能的实现方式中的方法。
第九方面,提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序代码,该计算机程序代码被处理器执行时,使得处理器实现如第二方面或第三 方面任一种可能的实现方式中的方法。
第十方面,提供一种芯片系统,该芯片系统包括处理器、存储器,存储器中存储有计算机程序代码;所述计算机程序代码被所述处理器执行时,使得处理器实现如第二方面或第三方面任一种可能的实现方式中的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
第十一方面,提供一种计算机程序产品,该计算机程序产品包括计算机指令。当该计算机指令在计算机上运行时,使得计算机实现如第二方面或第三方面任一种可能的实现方式中的方法。
附图说明
图1为本申请实施例提供的一种用户设备的硬件结构示意图;
图2为本申请实施例提供的一种用户设备的软件结构示意图;
图3为本申请实施例提供的一种云侧设备的硬件结构示意图;
图4为本申请实施例提供的一种设备定位方法流程图;
图5A为本申请实施例提供的一种实景三维模型示例图;
图5B为本申请实施例提供的一种全景图的模态示意图;
图6为本申请实施例提供的一种用户设备摄取的图像示例图;
图7为本申请实施例提供的一种用户设备的空间位置示意图;
图8为本申请实施例提供的一种设备定位方法交互流程图;
图9为本申请实施例提供的一种设备定位方法交互示例图;
图10为本申请实施例提供的一种图像预处理效果示例图;
图11为本申请实施例提供的一种预设范围示例图;
图12为本申请实施例提供的另一种预设范围示例图;
图13为本申请实施例提供的一种特征匹配过程示例图;
图14为本申请实施例提供的一种AR应用效果示例图;
图15为本申请实施例提供的另一种AR应用效果示例图;
图16为本申请实施例提供的一种用户设备的结构框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
本申请实施例提供一种设备定位方法,该方法用于确定设备的空间位姿(包括设备的位置和姿态)。
作为一种示例,本申请实施例提供的一种设备定位方法可以应用于游戏、医疗、 教育、导航、娱乐等领域的AR应用场景中,如AR地图、AR导航、AR广告牌、AR全息信息展示、AR虚实融合拍照等。
可以理解,在AR应用场景中,虚拟物体在真实场景中叠加的位置和朝向等需要基于设备的空间位姿确定。因此,设备空间位姿的精度对于用户的体验度至关重要。例如,在AR导航时,若对设备的空间位姿计算不准确,会导致虚拟导航箭头在地图中叠加的位置发生偏差,因此带给用户的导航体验较差。
基于上述考虑,加之随着视觉即时定位与地图构建(simultaneous localization and mapping,SLAM)和深度学习在图像领域的不断发展,可以通过视觉定位技术,根据用户设备捕获的图像,确定设备的空间位姿。例如,可以基于人工设计特征点、特征描述符检索、天际线特征匹配等方法确定设备空间位姿。
其中,基于人工设计特征点确定设备空间位姿的基本原理是:根据实际需求人为预先布置来自真实世界的视觉特征点,通过比较用户通过设备捕获的图像中的特征点与该人为布置的视觉特征点确定设备是否处于预设空间位姿。
基于特征描述符检索方法确定设备空间位姿的基本原理是:提取用户通过设备捕获的图像中的特征描述符,在预先设置的切片式图像检索特征库中进行匹配,然后通过精确导航处理器(precision navigation processor,PnP)确定设备空间位姿。
基于天际线特征匹配方法确定设备空间位姿的基本原理是:提取用户通过设备捕获的图像中的天际线特征,在天际线特征库中进行匹配以确定设备空间位姿。其中,天际线是城市形体轮廓(如建筑物轮廓)的重要表现形式,天际线反映的是三维的空间层次特征。
但是,上述方法均存在弊端,如基于人工设计特征点的方法,由于其视觉特征点是人为预先布置的,因此只能应用于特定的、有限的场景,难以大规模应用。又如基于特征描述符检索的方法,由于特征库以切片形式存储图像,因此会存在大量的图像冗余,增加了存储空间和检索时的查询时间;且该方法定位的精度依赖于检索和PnP处理两个环节的性能,对图像精度要求较高,制图成本高。又如基于天际线特征匹配的方法在天际线特征不明显的场景(如轮廓不清晰、不完整等场景)中,无法准确定位。
为了解决上述设备空间位姿确定方法存在的通用性弱、制图成本高、存储要求高、场景适应性不足等问题,本申请实施例提供一种设备定位方法,该方法可以基于低成本、广覆盖、高精度的视觉定位技术实现对用户设备空间位姿的准确定位。
进一步的,在本申请实施例中,在将对用户设备的空间位姿的确定结果应用于AR场景时,可以根据对用户设备的空间位姿的确定结果,以用户在真实场景中的位置的视觉角度,将虚拟物体(如虚拟导航箭头等)以准确的位置和朝向叠加在真实场景中,为用户带来更好的AR体验。
其中,用户设备支持图像捕获功能和显示功能。
图像捕获功能例如拍照功能、摄像功能等。在本申请实施例中,作为一种示例,图像可以是用户设备通过拍照获取的图片。或者,图像还可以是用户设备通过摄像获取的视频中的图像帧,本申请不限定。
在一些实施例中,用户设备支持通过显示功能,显示用户设备捕获的图像或者视 频。
在一些实施例中,用户设备支持通过显示功能,显示AR/VR视频,例如在真实场景中叠加了虚拟人物、虚拟图标等的视频。
示例性的,用户设备可以包括但不限定于智能手机、上网本、平板电脑、智能眼镜、智能手表、智能手环、电话手表、智能相机、个人计算机(personal computer,PC)、超级计算机、掌上电脑、AR/VR设备、个人数字助理(personal digital assistant,PDA)、便携式多媒体播放器(portable multimedia player,PMP)、会话启动协议(session initiation protocol,SIP)电话、物联网(internet of things,IOT)设备、智慧城市(smart city)中的无线设备、智慧家庭(smart home)中的无线设备或体感游戏机等。或者,用户设备还可以具有其他结构和/或功能,本申请不限定。
请参考图1,图1出了本申请实施例提供的一种用户设备的硬件结构示意图。如图1所示,用户设备可以包括处理器110,存储器(包括外部存储器接口120和内部存储器121),通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像组件193,显示屏194等。其中,传感器模块180可以包括陀螺仪传感器,加速度传感器,磁传感器,触摸传感器,指纹传感器,压力传感器,气压传感器,距离传感器,接近光传感器,温度传感器,环境光传感器,骨传导传感器等。
可以理解的是,本发明实施例示意的结构并不构成对用户设备的具体限定。在本申请另一些实施例中,用户设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元。例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),飞行控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,和/或通用串行总线(universal serial bus,USB)接口 等。
充电管理模块140用于从充电器接收充电输入。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像组件193,和无线通信模块160等供电。
用户设备的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。用户设备中的每个天线可用于覆盖单个或多个通信频段。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在用户设备上的包括2G/3G/4G/5G/6G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A、受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在用户设备上的包括无线局域网(wireless local area networks,WLAN)(如Wi-Fi网络),蓝牙BT,全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,用户设备的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得用户设备可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址 (wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),新无线(new radio,NR),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
在本申请实施例中,用户设备可以通过GNSS采集用户设备的位置信息,如采集用户设备的经纬度信息等。
用户设备通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
在本申请实施例中,用户设备可以通过GPU进行用户设备捕获的图像的渲染,虚拟物体(如虚拟导航箭头、虚拟路标、虚拟广告牌、虚拟信息、虚拟事物等)在真实场景的图像中的叠加渲染等。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flexible light-emitting diode,FLED),MiniLED,MicroLED,Micro-OLED,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,用户设备可以包括1个或K个显示屏194,K为大于1的正整数。
在本申请实施例中,用户设备可以通过显示屏194显示用户设备捕获的图像,显示在真实场景中叠加了虚拟物体(如虚拟导航箭头、虚拟路标、虚拟广告牌、虚拟信息、虚拟事物等)的AR图像。
用户设备可以通过ISP,摄像组件193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,固态硬盘等,实现扩展用户设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储用户设备使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行用户设备的各种功能应用以及数据处理。
用户设备可以通过音频模块170,扬声器170A,受话器170B,麦克风170C以及应用处理器等实现音频功能。例如音乐播放,录音等。关于音频模块170,扬声器170A,受话器170B和麦克风170C的具体工作原理和作用,可以参考常规技术中的介绍。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。用户设备可以接收按键输入,产生与用户设备的用户设置以及功能控制有关的键信号输入。
需要说明的是,图1所示用户设备包括的硬件模块只是示例性地描述,并不对用户设备的具体结构做出限定。例如,若用户设备是智能手机,那么用户设备还可以包括用户标识模块(subscriber identity module,SIM)接口。若用户设备是PC,那么用户设备还可以包括键盘、鼠标等部件。
在本申请中,用户设备的操作系统可以包括但不限于
Figure PCTCN2022120592-appb-000001
Figure PCTCN2022120592-appb-000002
Figure PCTCN2022120592-appb-000003
等操作系统。
以包括分层架构的
Figure PCTCN2022120592-appb-000004
系统的用户设备为例,如图2所示,用户设备的软件可以分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。如图2所示,用户设备的软件结构从上至下可以分为三层:应用程序层(简称应用层),应用程序框架层(简称框架层),系统库,安卓运行时和内核层(也称为驱动层)。
其中,应用程序层可以包括一系列应用程序包,例如相机,图库,日历,通话,地图,导航,蓝牙,音乐,视频,短信息,AR应用等应用程序。为方便描述,以下将应用程序简称为应用。在本申请实施例中,AR应用可以支持用户设备为用户提供AR场景下的虚实融合体验。例如AR应用可以是AR地图、AR导航、AR广告牌、AR全息信息展示、AR虚实融合拍照、
Figure PCTCN2022120592-appb-000005
河图等应用。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。如图2所示,应用程序框架层可以包括窗口管理服务器(window manager service,WMS),活动管理服务器(activity manager service,AMS)和输入事件管理服务器(input manager service,IMS)。在一些实施例中,应用程序框架层还可以包括内容提供器,视图系统,电话管理器,资源管理器,通知管理器等(图2中未示出)。
系统库和安卓运行时包含框架层所需要调用的功能函数,Android的核心库,以及Android虚拟机。系统库可以包括多个功能模块。例如:浏览器内核,三维(3dimensional,3D)图形,字体库等。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
内核层是硬件和软件之间的层。内核层可以包含显示驱动,输入/输出设备驱动(例如,键盘、触摸屏、耳机、扬声器、麦克风等),设备节点,摄像头驱动,音频驱动以及传感器驱动等。用户通过输入设备进行输入操作,内核层可以根据输入操作 产生相应的原始输入事件,并存储在设备节点中。输入/输出设备驱动可以检测到用户的输入事件。例如,麦克风可以检测到用户发出的语音。
需要说明的是,图2仅以分层架构的
Figure PCTCN2022120592-appb-000006
系统为例,介绍一种用户设备的软件结构。本申请不限定用户设备软件系统的具体架构,关于其他架构的软件系统的具体介绍,可以参考常规技术。
在本申请实施例中,对用户设备空间位姿的确定可以由用户设备(如智能手机)完成,也可以由云侧设备(如服务器)完成,还可以由用户设备(如智能手机)和云侧设备(如服务器)共同完成。
作为一种示例,图3示出了本申请实施例提供的一种云侧设备的硬件结构示意图。其中,如图3所示,云侧设备可以包括处理器301,通信线路302,存储器303以及至少一个通信接口(图3中仅是示例性的以包括通信接口304为例进行说明)。
处理器301可以包括一个或多个处理器,其中,处理器可以为CPU,微处理器,特定ASIC,或其它集成电路,不予限制。
通信线路302可包括一通路,用于在上述组件之间传送信息。
通信接口304,用于与其他设备或通信网络通信。
存储器303可以是ROM或RAM,或者EEPROM、CD-ROM,或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
需要说明的是,存储器可以是独立存在,通过通信线路302与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器303用于存储计算机程序。处理器301用于执行存储器303中存储的计算机程序,从而实现本申请下述任一方法实施例提供的方法。
需要说明的是,处理器301可以包括一个或多个CPU,例如图3中的CPU0和CPU1。此外,图3仅作为一种云侧设备的示例,并不对云侧设备的具体结构做出限定。例如,云侧设备还可以包括其他功能模块。
以下将结合附图,对本申请实施例提供的一种设备定位方法作具体介绍。
如图4所示,本申请实施例提供的一种设备定位方法可以通过提取用户设备捕获的图像水平分辨率一致特征,并在全景特征库中进行特征匹配,从而确定用户设备的空间位姿。其中,全景特征库用于通过大量的360°全景特征来表示多个对象,其中全景特征还具有位置属性。对象如建筑(如商场、写字楼等)、公园、桥梁、公路、草坪、广场、路灯、路标、广场、河流、山等本申请实施例不限定。
其中,用户设备的空间位姿用于表示用户设备的位置和姿态。例如,用户设备的位置可以用用户设备相对于地面的坐标值来表示,用户设备的姿态可以用用户设备相对于地面的角度来表示。
如图4所示,本申请实施例提供的一种设备定位方法可以包括以下三个阶段(阶段1-阶段3):
阶段1:全景特征库构建阶段。
其中,全景特征库中包括用于表示多个对象的多个360°全景特征(以下简称 “全景特征”)。每一个对象可以用多个全景特征来表示,其中全景特征还具有位置属性。例如,全景特征库可以以城市为粒度构建,对于这种情况,全景特征库中包括用于表示城市中各种事物(即对象)的多个全景特征,如建筑、公园、桥梁、公路、草坪、广场、路灯、路标、广场、河流、山等。
作为一种示例,在本申请实施例中,可以基于实景三维模型构建得到全景特征库。例如,如图4所示,可以通过对实景三维模型进行语义化分类,提取标志性对象,网格化,逐网格渲染,以及逐网格全景特征编码得到全景特征库。
其中,对实景三维模型进行语义化分类是指将实景三维模型所描述的对象分类。以图5A所示实景三维模型为例,可以理解,实景三维模型用于描述不同类型的对象,如建筑、地面可行区域、绿化区域等。对实景三维模型进行语义化分类即按照实景三维模型所描述对象的类型,将对象分类。
提取标志性对象是指提取实景三维模型中,用于描述预设类型的对象的信息。预设类型如建筑、地面可行区域、绿化区域等。
可以理解,在进行设备定位时,可以通过将用户设备捕获的图像中的特征与全景特征库中的特征进行匹配,以确定用户设备捕获图像时的空间位姿。而进行特征匹配时,具有代表性的特征对于特征匹配和匹配的可靠性贡献比较大,例如建筑等特征。基于此,在本申请实施例中,在进行全景特征库构建时,可以预先设定对象,以确定全景特征库中的一个或多个对象。其中,全景特征库中的一个或多个对象具有标志性。具有标志性是指对于特征匹配参考价值大,对特征匹配结果可靠性贡献比较大。
网格化用于将提取的一个或多个预设类型的对象网格化。
逐网格渲染如逐网格对提取的一个或多个预设类型的对象球面渲染。经过逐网格渲染可以得到渲染图。
在一些实施例中,为了方便后续处理,可以将逐网格渲染得到的渲染图以平面图像的形式表示,得到全景图。其中,全景图用于表征一个或多个标志性对象的多个模态的信息。如图5B所示,图5B示出了将图5A所示公共信息模型(common information model,CIM)逐网格渲染、柱面展开后得到的全景图的模态示意图。
全景图的模态可以包括但不限于纹理、实例和深度等。其中,全景图的纹理信息包括一个或多个对象的表面纹理(即使物体表面呈现凹凸不平的沟纹)信息和表面图案信息。全景图的实例信息用于表征不同的对象,例如全景图的实例信息可以用不同的颜色色调等表示不同的对象。全景图的深度信息用于表征对象的距离,例如,全景图的深度信息可以用颜色亮度、颜色对比度或颜色色调等表示对象的距离。如图5B所示,全景图包括纹理、实例和深度三个模态的信息。
作为一种示例,逐网格全景特征编码用于逐网格提取全景图所描述的一个或多个预设类型的对象的水平分辨率一致全景特征(以下简称“第一特征”),以得到全景特征库。例如,全景特征(即第一特征)的水平分辨率是第一分辨率。其中,水平分辨率一致是指全景特征(即第一特征)的水平方向相邻位置对应的水平视场(field of view,FOV)相同。
阶段2:水平分辨率一致特征提取阶段。
其中,水平分辨率一致特征提取阶段用于提取用户设备捕获的图像(如第一图 像)中的水平分辨率一致特征(以下简称“第二特征”)。例如,图像中的水平分辨率一致特征(即第二特征)是第二分辨率。其中,图像中的水平分辨率一致特征(即第二特征)的水平方向相邻位置对应的水平FOV相同。
用户设备捕获的图像如用户设备拍摄的图像(如图6所示图像)、用户设备拍摄的视频中的图像帧等,本申请不限定。示例性的,用户设备可以对其面前的真实场景(如建筑物等)拍照,得到拍摄的图像。
其中,为了在通过将用户设备捕获的图像的水平分辨率一致特征(即第二特征)与全景特征库中的特征(即第一特征)进行匹配,以确定用户设备的空间位姿时,得到更加好的匹配效果,上述第二分辨率与第一分辨率相同。
在一些实施例中,为了进一步提高特征匹配的准确性,还可以在提取用户设备捕获的图像中的水平分辨率一致特征之前,对用户设备捕获的图像进行预处理,得到预处理后的图像(即第二图像)。
其中,预处理用于对用户设备捕获的图像做以下处理中的一种或多种:初始化空间姿态、图像亮度调整、图像对比度调整、确定图像所描述的对象的类别、柱面投影。
阶段3:特征匹配阶段。
其中,特征匹配阶段用于通过将用户设备捕获的图像的水平分辨率一致特征(即第二特征)与全景特征库中的特征(即第一特征)进行匹配,以确定用户设备的空间位姿。
作为一种可能的实现方式,可以在全景特征库全库范围内检索,以确定与用户设备捕获的图像的水平分辨率一致特征(即第二特征)匹配的第一特征。
作为一种可能的实现方式,可以在全景特征库的预设范围内检索,以确定与用户设备捕获的图像的水平分辨率一致特征(即第二特征)匹配的第一特征。
示例性的,预设范围可以包括多个优先级的子范围。例如,预设范围包括第一范围和第二范围。其中,第一范围的优先级高于第二范围,在进行特征匹配时,优先在优先级高的子范围内检索。
作为一种示例,可以通过将第二特征在全景特征库中滑窗,计算第二特征与滑窗范围内多个第一特征的匹配相似度,根据第二特征与多个滑窗范围中的多个第一特征的多个匹配相似度,确定与第二特征匹配的第一特征。例如,与第二特征匹配的特征是最高匹配相似度对应的第一特征。
作为一种示例,在本申请实施例中,经过特征匹配确定的用户设备的空间位姿可以用用户设备的6自由度位姿来表示。示例性的,6自由度位姿可以用(x,y,z,θ,ψ,φ)来表示。其中,(x,y,z)用于表示用户设备的空间位置,x,y和z分别是用户设备在预设空间坐标系中的X轴坐标值,Y轴坐标值和Z轴坐标值。(θ,ψ,φ)用于表示用户设备的空间姿态,θ即俯仰角(pitch),ψ即偏航角(yaw),φ即横滚角(roll)。例如,θ,ψ和φ分别是用户设备相对于预设空间坐标系的X轴,Y轴和Z轴的旋转值。
例如,预设空间坐标系可以是地面坐标系。如图7所示,以O为坐标原点的X轴,Y轴和Z轴构成的右手直角坐标系即地面坐标系。其中,坐标原点O可以为空 间中的任意一点;X轴指向水平面内的任一方向;Z轴垂直于X轴所在的平面并指向地心。Y轴垂直于X轴,且垂直于Z轴。
如图7所示,用户设备的空间位置可以用(x,y,z)来表示,即用户设备在预设空间坐标系中的坐标值为(x,y,z)。
需要说明的是,图7仅以预设空间坐标系是地面坐标系作为示例,预设空间坐标系还可以是其他空间坐标系,本申请实施例不作具体限定。
如上文所述,本申请实施例中,对用户设备空间位姿的确定可以由用户设备(如智能手机)完成,也可以由云侧设备(如服务器)完成,还可以由用户设备(如智能手机)和云侧设备(如服务器)共同完成。即,上述阶段1-阶段3的任务可以由用户设备执行,也可以由云侧设备执行,还可以由用户设备和云侧设备分工执行。
以阶段1-阶段3所执行的任务由用户设备和云侧设备分工执行为例,作为一种示例,全景特征库的构建任务(即阶段1的任务)可以由云侧设备(如服务器)执行,水平分辨率一致特征提取任务(即阶段2的任务)和特征匹配任务(即阶段3的任务)可以由用户设备(如智能手机)执行。
以云侧设备基于图5A所示实景三维模型得到全景特征库,用户设备根据图6所示用户设备摄取的图像进行设备定位为例,本申请实施例提供的一种设备定位方法可以包括如图8所示阶段1-阶段3。
以图8所示场景为例,如图9所示,本申请实施例提供的一种设备定位方法具体可以包括S901-S911:
S901、云侧设备获取实景三维模型。
例如,实景三维模型可以包括但不限于航空实景三维模型、卫星实景三维模型或者城市信息模型(如图5A所示公共信息模型(CIM))等中的一种或多种。
需要说明的是,本申请实施例不限定实景三维模型的来源和创建方法等。例如,可以通过对基于城市规划图、城市布局测量(如激光扫描仪测量等)、卫星测量、航空测量(如航拍、无人机航测等)等方法测量得到的信息进行三维模型创建得到实景三维模型。关于实景三维模型的具体介绍,如创建方法等,可以参考常规技术,本申请实施例不做赘述。
S902、云侧设备对实景三维模型进行语义化分类,得到实景三维模型所描述的多个对象的类型。
其中,实景三维模型用于描述多个对象,如建筑(如商场、写字楼等)、地面可行区域(如广场、道路、路灯等)、绿化区域(如树木、草坪等)等对象。
在本申请实施例中,经过对实景三维模型进行语义化分类,可以得到实景三维模型所描述的多个对象的类型,以便于后续全景特征库中全景特征具有较强的可参考性,以及减少全景特征库中信息的冗余(如草坪等进行特征匹配时可参考性较低的冗余信息)。
S903、云侧设备从实景三维模型所描述的多个对象中提取一个或多个预设类型的对象。
其中,预设类型例如建筑、地面可行区域、绿化区域等。预设类型的对象如建筑、山、广场、道路等。
其中,在将用户设备捕获的图像中的特征与全景特征库中的特征进行匹配,以进行设备定位时,如建筑、山、广场、道路等具有标志性的对象对于特征匹配参考价值大,对特征匹配结果可靠性贡献比较大。例如,基于建筑物的外形和纹理等信息,基于山的轮廓和棱角等信息,基于广场的布局和景物等信息,基于道路的尺寸和路标等信息可以相对快速、准确地进行特征匹配结果。
S904、云侧设备将提取的一个或多个预设类型的对象网格化。
在本申请实施例中,可以按照固定间隔或者动态间隔将提取的一个或多个预设类型的对象网格化。
在一些实施例中,固定间隔可以由算法或软件开发人员按照经验设置。作为一种示例,在其他条件等同的情况下,固定间隔越小,即采样越密集,则后续得到的全景特征库精度越高。固定间隔如0.5米,1米等,视具体情况而定。
在一些实施例中,动态间隔可以由算法或软件开发人员按照实景三维模型中,区域的重要性,和/或对象所属类型的重要程度设置。其中,实景三维模型中,区域的重要性和类型的重要程度由算法或软件开发人员按照经验设置,本申请不限定具体的设置依据。例如,在单体建筑和地面可行区域相对于绿化区域更重要。在实景三维模型中,城区相对于郊区更重要。示例性的,在本申请实施例中,相对重要的区域或相对重要的类型的对象在进行网格化时,间隔相对较小;相对次要的区域中的对象或相对次要的类型对应对象在进行网格化时,间隔相对较大。
S905、云侧设备逐网格渲染提取的一个或多个预设类型的对象,得到渲染图。
在一些实施例中,云侧设备可以通过对提取的一个或多个预设类型逐网格执行球面渲染,得到渲染图。
示例性的,逐网格球面渲染具体可以包括:在每个网格上设置一个半径固定(如1米)的虚拟球,将一个或多个预设类型的对象(如单体建筑和/或地面可行区域)投影到球面上。
在一些实施例中,进一步的,云侧设备可以将逐网格渲染得到的渲染图以平面图像的形式表示,得到全景图(如图5B所示)。其中,全景图的模态可以包括但不限于纹理、实例和深度等。
示例性的,云侧设备可以将逐网格渲染得到的渲染图柱面展开,得到全景图。其中,经过柱面展开后的平面图像的水平和竖直分辨率相同,例如均为0.1度等,视具体情况而定。即经过柱面展开后的平面图像竖直或水平相邻一个像素对应虚拟球面的竖直或水平相邻固定角度(如0.1度)位置处的像素。
S906、云侧设备逐网格对提取的一个或多个预设类型的对象全景特征编码,得到全景特征库。
作为一种示例,云侧设备可以逐网格对全景图中一个或多个预设类型的对象进行全景特征编码,得到网格对应的全景特征。
其中,全景特征(即第一特征)为水平分辨率一致特征,即全景特征(即第一特征)的水平方向相邻位置对应的水平FOV相同。示例性的,全景特征的宽度和高度分别为w i和h i
示例性的,可以利用人工智能(artificial intelligence,AI)模型,如模态编码网 络提取第一特征。其中,本申请实施例不限定模态编码网络的具体拓扑结构。例如,模态编码网络可以包括但不限定于深度卷积神经网络、深度残差网络、循环神经网络等。关于深度卷积神经网络、深度残差网络、循环神经网络等模态编码网络的具体介绍,可以参考常规技术,本申请实施例不做赘述。
S907、用户设备对捕获的图像进行预处理。
示例性的,在将用户设备捕获的图像进行预处理之后,得到第二图像。
预处理可以包括但不限于以下中的一种或多种:将俯仰角(pitch)和横滚角(roll)初始化(如校正为0)、将亮度调整为预设亮度、将对比度调整为预设对比度、对图像所描述的对象语义化分类、柱面投影。其中,将俯仰角(pitch)和横滚角(roll)初始化(如校正为0)用于将空间姿态初始化。语义化分类用于确定图像所描述的对象的类别。柱面投影用于通过将平面图像投影到圆柱的曲面上,以满足视觉一致性。经过柱面投影,图像可在水平方向上满足360度环视,具有较好的视觉效果。
请参考图10,图10示出了一种图像预处理效果示例。如图10所示,在将用户设备捕获的图像(即第一图像)的俯仰角(pitch)和横滚角(roll)校正为0之后,得到图10所示的第二图像。
S908、用户设备提取预处理后得到的图像(即第二图像)的水平分辨率一致特征(以下简称“第二特征”)。
作为一种示例,用户设备可以对预处理后得到的图像(即第二图像)所描述的一个或多个对象进行全景特征编码,得到第二图像中的水平分辨率一致(如第二分辨率)特征(即第二特征)。示例性的,第二分辨率与第一分辨率相同。
其中,第二图像中第二特征的水平方向相邻位置对应的水平FOV相同。示例性的,第二特征的宽度和高度分别为w j和h j(w j小于或等于w i)。
作为一种示例,可以利用AI模型,如模态编码网络提取第二图像中的水平分辨率一致(特征(即第二特征)。其中,模态编码网络可以包括但不限定于深度卷积神经网络、深度残差网络、循环神经网络等。
S909、用户设备从云侧设备获取全景特征库。
S910、用户设备通过将第二特征在全景特征库中滑窗,以确定与第二特征匹配的全景特征。
示例性的,用户设备可以将第二特征在全景特征库中滑窗,计算第二特征与多个滑窗范围内多个第一特征的匹配相似度,从而确定与第二特征最匹配的第一特征。
作为一种可能的实现方式,用户设备可以在全景特征库全库范围内进行滑窗,以计算第二特征与全库范围内多个滑窗范围内多个第一特征的匹配相似度。
作为一种可能的实现方式,用户设备可以在全景特征库的预设范围内滑窗,以计算第二特征与预设范围内多个滑窗范围内多个第一特征的匹配相似度。
作为一种示例,预设范围可以由用户设备根据采集到的位置信息,结合预设范围的设置规则确定。例如,位置信息可以由用户设备通过但不限于以下中的一种或多种定位系统采集得到:GPS、GLONASS、BDS、QZSS或SBAS等,本申请不限定。
其中,用户设备采集位置信息时所处的位置与用户设备捕获第一图像时所处的位 置相同。作为一种示例,用户设备可以在捕获第一图像的同时,采集位置信息。
在一些实施例中,预设范围可以是以全景特征库中,用户设备捕获第一图像时所处的位置(如图11所示O 0点)为中心,以r(r为正数)为半径的圆形区域,如图11所示。
可以理解,由于GPS、GLONASS、BDS、QZSS或SBAS等定位系统采集的用户设备的位置信息通常用经纬度信息来表示(如(lon,lat)),而全景特征库通常以坐标值来表示位置,基于此,可以将用户设备捕获第一图像时所处的位置由经纬度信息转换为坐标值,如将(lon,lat)转换为(X 0,Y 0),其中(X 0,Y 0)即全景特征库中,用户设备捕获第一图像时所处的位置。如图11所示,O 0点的坐标值为(X 0,Y 0)。
示例性的,预设范围可以包括多个优先级的子范围。例如,预设范围包括第一范围和第二范围。其中,第一范围的优先级高于第二范围,在进行滑窗检索时,优先在优先级高的子范围内检索。
示例性的,请参考图12,图12示出了本申请实施例提供的一种预设范围示意图。如图12所示,预设范围包括第一范围和第二范围。其中,第一范围是以全景特征库中,用户设备捕获第一图像时所处的位置(如图12所示O 0点)为中心,以r1(r1为正数)为半径的圆形区域。第二范围是以全景特征库中,用户设备捕获第一图像时所处的位置(如图12所示O 0点)为中心,以r1(r1为正数)为内径,r2(2为正数)为外径的圆环区域,其中r1小于r2。
以图11所示预设范围为例,假设滑窗步长为s(s为正数,且s小于或等于w i),预设范围内有N个第一特征(其中N为正整数),第一特征的宽度为w i,则第二特征与一个第一特征完成一次滑窗可以计算得到
Figure PCTCN2022120592-appb-000007
个相似度得分,第二特征在预设范围内滑窗可以计算得到
Figure PCTCN2022120592-appb-000008
个相似度得分。其中,
Figure PCTCN2022120592-appb-000009
指对w i/s的结果取整。
进一步的,在一些实施例中,用户设备可以根据
Figure PCTCN2022120592-appb-000010
个相似度得分确定与第二特征匹配的全景特征。例如,用户设备可以确定
Figure PCTCN2022120592-appb-000011
个相似度得分中,最大的相似度得分(如S max)对应的第一特征与第二特征匹配。又如,若
Figure PCTCN2022120592-appb-000012
个相似度得分中,最大的相似度得分(如S max)大于或等于预设阈值(如α),则用户设备确定该最大的相似度得分(即S max)对应的第一特征与第二特征匹配。若
Figure PCTCN2022120592-appb-000013
个相似度得分中,最大的相似度得分(如S max)小于预设阈值(如α),则用户设备定位失败。
若预设范围包括多个优先级的子范围,则用户设备可以按照高优先级→低优先级的顺序,在全景特征库的多个子范围内滑窗,计算第二特征与多个子范围内多个第一特征的匹配相似度,直到确定与第二特征匹配的第一特征为止。
以图12所示预设范围为例,假设滑窗步长为s(s为正数,且s小于或等于w i),第一范围内有N1个第一特征(其中N1为正整数),第一特征的宽度为w i,则如图13所示,用户设备进行检索范围确定,确定在第一范围内滑窗,计算得到第二特征在第一范围内滑窗得到的
Figure PCTCN2022120592-appb-000014
个相似度得分。进一步的,作为一种示例,如图13所示,用户设备进行
Figure PCTCN2022120592-appb-000015
个相似度得分排序。若
Figure PCTCN2022120592-appb-000016
个相似 度得分中,最大的相似度得分(如S1 max)大于或等于预设阈值(如α),则用户设备确定该最大的相似度得分(即S1 max)对应的第一特征与第二特征匹配。若
Figure PCTCN2022120592-appb-000017
个相似度得分中,最大的相似度得分(如S1 max)小于预设阈值(如α),则用户设备扩大检索范围。如图13所示,用户设备确定在第二范围内滑窗。假设第二范围内有N2个第一特征(其中N1为正整数),第一特征的宽度为w i,则如图13所示,用户设备在全景特征库的第二范围内滑窗,计算得到
Figure PCTCN2022120592-appb-000018
个相似度得分。进一步的,如图13所示,用户设备对
Figure PCTCN2022120592-appb-000019
个相似度得分进行排序。若
Figure PCTCN2022120592-appb-000020
个相似度得分中,最大的相似度得分(如S2 max)大于或等于预设阈值(如α),则用户设备确定该最大的相似度得分(即S2 max)对应的第一特征与第二特征匹配。若
Figure PCTCN2022120592-appb-000021
个相似度得分中,最大的相似度得分(如S2 max)仍然小于预设阈值(如α),且无更低优先级的子范围,则用户设备定位失败。
S911、用户设备根据与第二特征匹配的全景特征,确定用户设备的空间位姿。
示例性的,用户设备的空间位姿可以用用户设备的6自由度位姿来表示。例如,6自由度位姿可以用(x,y,z,θ,ψ,φ)来表示。其中,(x,y,z)用于表示用户设备的空间位置。(θ,ψ,φ)用于表示用户设备的空间姿态,θ即俯仰角(pitch),ψ即偏航角(yaw),φ即横滚角(roll)。
作为一种可能的实现方式,用户设备在确定与第二特征匹配的全景特征之后,可以得到(x,y,z)、俯仰角(pitch)θ、偏航角(yaw)ψ和横滚角(roll)φ。其中,与第二特征匹配的全景特征在实景三维模型中对应的位置即为用户设备的空间位置(x,y,z)。与第二特征匹配的全景特征对应的俯仰角、偏航角和横滚角即为用户设备的俯仰角(pitch)θ、偏航角(yaw)ψ和横滚角(roll)φ。然后,用户设备输出用户设备的6自由度位姿(x,y,z,θ,ψ,φ)。
作为另一种可能的实现方式,用户设备在确定与第二特征匹配的全景特征之后,可以得到(x,y,z)和偏航角(yaw)ψ。其中,与第二特征匹配的全景特征在实景三维模型中对应的位置即为用户设备的空间位置(x,y,z)。与第二特征匹配的全景特征对应的偏航角即为用户设备的偏航角(yaw)ψ。然后,用户设备通过微调捕获图像时传感器采集到的俯仰角和横滚角,比较微调后图像水平分辨率一致特征与全景特征匹配的相似度得分,以确定用户设备的俯仰角(pitch)θ和横滚角(roll)φ。例如,若微调后的相似度得分大于微调前的相似度得分,则保留微调后的俯仰角和横滚角。然后,用户设备输出用户设备的6自由度位姿(x,y,z,θ,ψ,φ)。
进一步的,在一些AR场景中,如AR地图、AR导航、AR广告牌、AR全息信息展示、AR虚实融合拍照等场景中,在用户设备确定用户设备的空间位姿之后,用户设备可以根据用户设备的空间位姿,将虚拟物体(如虚拟导航箭头、虚拟路标、虚拟广告牌、虚拟信息、虚拟事物等)以准确的位置和朝向叠加在真实场景中,为用户带来更好的AR体验。
本申请实施例提供的设备定位方法相比于常规技术不依赖于人工设计特征点,不需要预先设置切片式图像检索特征以及额外的PnP处理,可以基于实景三维模型的全景特征库构建技术和滑窗检索技术实现低成本、广覆盖、高精度的用户设备定位。且 该方法云上数据量小、端侧运算高效,可以为用户提供轻量级的AR应用服务。
另外,本申请实施例提供的设备定位方法相比于常规技术可以应用于天际线特征不明显的场景。
例如,在图14所示的场景中,左侧为用户设备摄取的照片,如图14中的左图所示,大雾天气将建筑物上方天际线遮挡。若采用常规的基于天际线特征匹配方法确定设备空间位姿,则无法准确定位。而利用基于本申请实施例提供的实景三维模型的滑窗检索视觉定位方法确定的6自由度位姿将虚拟人物渲染到真实场景对应位置的效果,如图14中的右图所示。
又如,在图15所示的场景中,如图15中的左图所示,由于用户设备只拍到建筑物一部分,因此图像中无任何天际线轮廓特征。若采用常规的基于天际线特征匹配方法确定设备空间位姿,则无法准确定位。而利用基于本申请实施例提供的实景三维模型的滑窗检索视觉定位方法确定的6自由度位姿将虚拟人物渲染到真实场景对应位置的效果,如图15中的右图所示。
应理解,本申请实施例的各个方案可以进行合理的组合使用,并且实施例中出现的各个术语的解释或说明可以在各个实施例中互相参考或解释,对此不作限定。
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
可以理解的是,电子设备(包括第一设备(如云侧设备)和第二设备(如用户设备))为了实现上述任一个实施例的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以对电子设备(包括第一设备(如云侧设备)和第二设备(如用户设备))进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
比如,以采用集成的方式划分各个功能模块的情况下,如图16所示,为本申请实施例提供的一种用户设备的结构框图。如图16所示,该用户设备可以包括图像捕获单元1610、处理单元1620、位置检测单元1630、显示单元1640、存储单元1650和收发单元1660。
其中,图像捕获单元1610用于支持用户设备捕获图像(如第一图像),例如图像捕获单元1610包括一个或多个摄像头。
处理单元1620用于支持用户设备对图像捕获单元1610捕获的图像进行预处理,确定固定间隔、动态间隔或预设范围等,提取用户设备通过图像捕获单元1610捕获 的图像的水平分辨率一致特征(如第二特征),在全景特征库中检索以确定与图像的水平分辨率一致特征(如第二特征)匹配的全景特征(如第一特征),根据与图像的水平分辨率一致特征(如第二特征)匹配的全景特征(如第一特征),确定用户设备通过图像捕获单元1610捕获图像时的空间位姿,和/或与本申请实施例相关的其他过程。
位置检测单元1630用于支持用户设备在图像捕获单元1610捕获图像时获取用户设备的位置信息,和/或与本申请实施例相关的其他过程。
显示单元1640用于支持用户设备显示图像捕获单元1610捕获的图像,显示在真实场景中叠加了虚拟物体(如虚拟导航箭头、虚拟路标、虚拟广告牌、虚拟信息、虚拟事物等)的AR图像,和/或与本申请实施例相关的其他界面。
存储单元1650用于支持用户设备存储计算机程序和实现本申请实施例提供的方法中的处理数据和/或处理结果等。
收发单元1660用于进行无线电信号的发送和接收。例如,收发单元1660用于支持用户设备从第一设备(如云侧设备)获取全景特征库,和/或与本申请实施例相关的其他过程。
作为一种示例,上述收发单元1660可以包括射频电路。具体的,用户设备可以通过射频电路进行无线信号的接收和发送。通常,射频电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频电路还可以通过无线通信和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统、通用分组无线服务、码分多址、宽带码分多址、长期演进、电子邮件、短消息服务等。
应理解,电子设备中的各个模块可以通过软件和/或硬件形式实现,对此不作具体限定。换言之,电子设备是以功能模块的形式来呈现。这里的“模块”可以指特定应用集成电路ASIC、电路、执行一个或多个软件或固件程序的处理器和存储器、集成逻辑电路,和/或其他可以提供上述功能的器件。
在一种可选的方式中,当使用软件实现数据传输时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地实现本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线((digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如软盘、硬盘、磁带)、光介质(例如数字化视频光盘(digital video disk,DVD))、或者半导体介质(例如固态硬盘solid state disk(SSD))等。
结合本申请实施例所描述的方法或者算法的步骤可以硬件的方式来实现,也可以 是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于电子设备中。当然,处理器和存储介质也可以作为分立组件存在于电子设备中。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。

Claims (30)

  1. 一种设备定位方法,其特征在于,所述方法包括:
    获取全景特征库,所述全景特征库中包括用于表示多个对象的多个第一特征,所述多个第一特征的水平分辨率一致;
    提取用户设备捕获的图像的第二特征,所述第二特征的水平分辨率一致;
    在所述全景特征库中检索,以确定与所述第二特征匹配的第一特征;
    根据与所述第二特征匹配的所述第一特征,确定所述用户设备捕获所述图像时的空间位姿,所述空间位姿用于表示所述用户设备的位置和姿态。
  2. 根据权利要求1所述的方法,其特征在于,所述获取全景特征库,包括:基于实景三维模型,构建得到所述全景特征库;
    其中,所述实景三维模型用于描述多个对象的空间信息。
  3. 根据权利要求2所述的方法,其特征在于,所述基于实景三维模型,构建得到所述全景特征库,包括:
    通过对所述实景三维模型语义化分类,预设类型对象提取、渲染和全景特征编码,得到所述全景特征库。
  4. 根据权利要求3所述的方法,其特征在于,所述基于实景三维模型,构建得到所述全景特征库,包括:
    对所述实景三维模型进行语义化分类,得到所述实景三维模型所描述的多个对象的类型;
    从所述实景三维模型所描述的多个对象中提取一个或多个预设类型的对象;
    将提取的所述一个或多个预设类型的对象网格化;
    逐网格渲染所述一个或多个预设类型的对象,得到渲染图;
    将所述渲染图柱面展开,得到全景图;
    逐网格对所述全景图中的一个或多个预设类型的对象全景特征编码,得到所述全景特征库。
  5. 根据权利要求4所述的方法,其特征在于,所述将提取的所述一个或多个预设类型的对象网格化,包括:
    按照固定间隔或者动态间隔将提取的所述一个或多个预设类型的对象网格化。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    按照所述实景三维模型中区域的重要性,和/或所述一个或多个预设类型的对象所属类型的重要程度设置所述动态间隔。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述用户设备捕获的所述图像是第一图像,在所述提取用户设备捕获的图像的第二特征之前,所述方法还包括:
    对所述第一图像进行预处理,得到第二图像;
    其中,所述预处理包括以下中的一种或多种:初始化空间姿态、将图像亮度调整为预设亮度、将图像对比度调整为预设对比度、对图像所描述的对象语义化分类、柱面投影。
  8. 根据权利要求7所述的方法,其特征在于,所述提取用户设备捕获的图像的 第二特征,包括:
    对所述第二图像所描述的一个或多个对象进行全景特征编码,得到所述第二图像的所述第二特征。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述在所述全景特征库中检索,以确定与所述第二特征匹配的第一特征,包括:
    将所述第二特征在所述全景特征库中滑窗,确定所述第二特征与滑窗范围内多个第一特征的匹配相似度;
    根据所述滑窗范围中所述第二特征与所述多个第一特征的多个匹配相似度,确定与所述第二特征匹配的第一特征。
  10. 根据权利要求9所述的方法,其特征在于,与所述第二特征匹配的所述第一特征是所述多个匹配相似度中,最高的匹配相似度对应的第一特征。
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述在所述全景特征库中检索,以确定与所述第二特征匹配的第一特征,包括:
    在所述全景特征库全库范围内检索,以确定与所述第二特征匹配的所述第一特征;或者,
    在所述全景特征库的预设范围内检索,以确定与所述第二特征匹配的所述第一特征。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    根据所述用户设备捕获所述图像时,所述用户设备所处的位置,结合预设范围的设置规则确定所述预设范围。
  13. 根据权利要求11或12所述的方法,其特征在于,所述预设范围是以所述用户设备捕获所述图像时,所述用户设备所处的位置为中心,以r为半径的圆形区域,其中r为正数。
  14. 根据权利要求11或12所述的方法,其特征在于,所述预设范围包括第一范围和第二范围,所述第一范围的优先级高于所述第二范围;在所述全景特征库的预设范围内检索,以确定与所述第二特征匹配的所述第一特征,包括:
    在所述第一范围内检索;
    若在所述第一范围内未检索到与所述第二特征匹配的所述第一特征,则在所述第二范围内检索,以确定与所述第二特征匹配的所述第一特征。
  15. 根据权利要求14所述的方法,其特征在于,
    所述第一范围是以所述用户设备捕获所述图像时,所述用户设备所处的位置为中心,以r1为半径的圆形区域,其中r1为正数;
    所述第一范围是以所述用户设备捕获所述图像时,所述用户设备所处的位置为中心,以r1为内径,r2为外径的圆环区域,其中r1和r2为正数,且r1小于r2。
  16. 根据权利要求1-15中任一项所述的方法,其特征在于,所述第二特征的水平分辨率与所述第一特征的水平分辨率相同。
  17. 根据权利要求1-16中任一项所述的方法,其特征在于,所述实景三维模型包括以下中的一种或多种:航空实景三维模型、卫星实景三维模型、城市信息模型。
  18. 一种第一设备,其特征在于,所述第一设备包括:
    存储器,用于存储计算机程序;
    收发器,用于进行无线电信号接收和发送;
    处理器,用于执行所述计算机程序,使得所述第一设备基于实景三维模型,构建得到全景特征库;
    其中,所述实景三维模型用于描述多个对象的空间信息;所述全景特征库中包括用于表示多个对象的多个第一特征,所述多个第一特征的水平分辨率一致。
  19. 根据权利要求18所述的设备,其特征在于,所述处理器具体用于:
    执行所述计算机程序,使得所述第一设备对所述实景三维模型进行语义化分类,得到所述实景三维模型所描述的多个对象的类型;
    从所述实景三维模型所描述的多个对象中提取一个或多个预设类型的对象;
    将提取的所述一个或多个预设类型的对象网格化;
    逐网格渲染所述一个或多个预设类型的对象,得到渲染图;
    将所述渲染图柱面展开,得到全景图;以及,
    逐网格对所述全景图中的一个或多个预设类型的对象全景特征编码,得到所述全景特征库。
  20. 根据权利要求19所述的设备,其特征在于,所述将提取的所述一个或多个预设类型的对象网格化,包括:
    按照固定间隔或者动态间隔将提取的所述一个或多个预设类型的对象网格化。
  21. 根据权利要求18-20中任一项所述的设备,其特征在于,所述实景三维模型包括以下中的一种或多种:航空实景三维模型、卫星实景三维模型、城市信息模型。
  22. 一种第二设备,其特征在于,所述第二设备包括:
    存储器,用于存储计算机程序;
    收发器,用于进行无线电信号接收和发送;
    处理器,用于执行所述计算机程序,使得所述第二设备提取所述第二设备捕获的图像的第二特征,所述第二特征的水平分辨率一致;
    在所述全景特征库中检索,以确定与所述第二特征匹配的第一特征;以及,
    根据与所述第二特征匹配的所述第一特征,确定所述用户设备捕获所述图像时的空间位姿,所述空间位姿用于表示位置和姿态。
  23. 根据权利要求22所述的设备,其特征在于,所述处理器还用于:
    执行所述计算机程序,使得所述第二设备对所述第一图像进行预处理,得到第二图像;
    其中,所述预处理包括以下中的一种或多种:初始化空间姿态、将图像亮度调整为预设亮度、将图像对比度调整为预设对比度、对图像所描述的对象语义化分类、柱面投影。
  24. 根据权利要求22或23所述的设备,其特征在于,所述处理器具体用于:
    执行所述计算机程序,使得所述第二设备对所述第二图像所描述的一个或多个对象进行全景特征编码,得到所述第二图像的所述第二特征。
  25. 根据权利要求22-24中任一项所述的设备,其特征在于,所述处理器具体用于:
    执行所述计算机程序,使得所述第二设备将所述第二特征在所述全景特征库中滑窗,计算所述第二特征与滑窗范围内多个第一特征的匹配相似度;以及,
    根据所述第二特征与多个滑窗范围中所述多个第一特征的多个匹配相似度,确定与所述第二特征匹配的第一特征。
  26. 根据权利要求25所述的设备,其特征在于,与所述第二特征匹配的所述第一特征是所述多个匹配相似度中,最高的匹配相似度对应的第一特征。
  27. 根据权利要求22-26中任一项所述的设备,其特征在于,所述处理器具体用于:
    执行所述计算机程序,使得所述第二设备在所述全景特征库全库范围内检索,以确定与所述第二特征匹配的所述第一特征;或者,
    在所述全景特征库的预设范围内检索,以确定与所述第二特征匹配的所述第一特征。
  28. 根据权利要求27所述的设备,其特征在于,所述预设范围包括第一范围和第二范围,所述第一范围的优先级高于所述第二范围;所述处理器具体用于:
    执行所述计算机程序,使得所述第二设备在所述第一范围内检索;以及,
    若在所述第一范围内未检索到与所述第二特征匹配的所述第一特征,则在所述第二范围内检索,以确定与所述第二特征匹配的所述第一特征。
  29. 一种设备定位系统,其特征在于,所述设备定位系统包括:
    如权利要求18-21中任一项所述的第一设备;以及,
    如权利要求22-28中任一项所述的第二设备。
  30. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序代码,所述计算机程序代码被处理电路执行时实现如权利要求1-17中任一项所述的方法。
PCT/CN2022/120592 2021-09-30 2022-09-22 一种设备定位方法、设备及系统 WO2023051383A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111166626.6A CN115937722A (zh) 2021-09-30 2021-09-30 一种设备定位方法、设备及系统
CN202111166626.6 2021-09-30

Publications (1)

Publication Number Publication Date
WO2023051383A1 true WO2023051383A1 (zh) 2023-04-06

Family

ID=85781293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120592 WO2023051383A1 (zh) 2021-09-30 2022-09-22 一种设备定位方法、设备及系统

Country Status (2)

Country Link
CN (1) CN115937722A (zh)
WO (1) WO2023051383A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886880A (zh) * 2023-09-08 2023-10-13 中移(杭州)信息技术有限公司 监控视频调整方法、装置、设备及计算机程序产品

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198488A (zh) * 2013-04-16 2013-07-10 北京天睿空间科技有限公司 Ptz监控摄像机实时姿态快速估算方法
WO2015096806A1 (zh) * 2013-12-29 2015-07-02 刘进 智能机姿态测定、全景影像生成及目标识别方法
US20190005718A1 (en) * 2015-12-31 2019-01-03 Tsinghua University Method and device for image positioning based on 3d reconstruction of ray model
CN111652929A (zh) * 2020-06-03 2020-09-11 全球能源互联网研究院有限公司 一种视觉特征的识别定位方法及系统
CN112073640A (zh) * 2020-09-15 2020-12-11 贝壳技术有限公司 全景信息采集位姿获取方法及装置、系统
CN112348887A (zh) * 2019-08-09 2021-02-09 华为技术有限公司 终端位姿确定方法以及相关装置
CN112348885A (zh) * 2019-08-09 2021-02-09 华为技术有限公司 视觉特征库的构建方法、视觉定位方法、装置和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198488A (zh) * 2013-04-16 2013-07-10 北京天睿空间科技有限公司 Ptz监控摄像机实时姿态快速估算方法
WO2015096806A1 (zh) * 2013-12-29 2015-07-02 刘进 智能机姿态测定、全景影像生成及目标识别方法
US20190005718A1 (en) * 2015-12-31 2019-01-03 Tsinghua University Method and device for image positioning based on 3d reconstruction of ray model
CN112348887A (zh) * 2019-08-09 2021-02-09 华为技术有限公司 终端位姿确定方法以及相关装置
CN112348885A (zh) * 2019-08-09 2021-02-09 华为技术有限公司 视觉特征库的构建方法、视觉定位方法、装置和存储介质
CN111652929A (zh) * 2020-06-03 2020-09-11 全球能源互联网研究院有限公司 一种视觉特征的识别定位方法及系统
CN112073640A (zh) * 2020-09-15 2020-12-11 贝壳技术有限公司 全景信息采集位姿获取方法及装置、系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886880A (zh) * 2023-09-08 2023-10-13 中移(杭州)信息技术有限公司 监控视频调整方法、装置、设备及计算机程序产品
CN116886880B (zh) * 2023-09-08 2023-12-26 中移(杭州)信息技术有限公司 监控视频调整方法、装置、设备及计算机程序产品

Also Published As

Publication number Publication date
CN115937722A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
US9842282B2 (en) Method and apparatus for classifying objects and clutter removal of some three-dimensional images of the objects in a presentation
WO2019223468A1 (zh) 相机姿态追踪方法、装置、设备及系统
US20170323478A1 (en) Method and apparatus for evaluating environmental structures for in-situ content augmentation
US9129429B2 (en) Augmented reality on wireless mobile devices
US9317133B2 (en) Method and apparatus for generating augmented reality content
CN112966124B (zh) 知识图谱对齐模型的训练方法、对齐方法、装置及设备
WO2023131090A1 (zh) 一种增强现实系统、多设备构建三维地图的方法及设备
WO2023051383A1 (zh) 一种设备定位方法、设备及系统
CN114076970A (zh) 一种定位方法、装置及系统
CN112270709A (zh) 地图构建方法及装置、计算机可读存储介质和电子设备
US20220157032A1 (en) Multi-modality localization of users
WO2021088497A1 (zh) 虚拟物体显示方法、全局地图更新方法以及设备
CN112053360B (zh) 图像分割方法、装置、计算机设备及存储介质
WO2023124948A1 (zh) 一种三维地图的创建方法及电子设备
WO2022252236A1 (zh) 3d地图的编解码方法及装置
US20240104781A1 (en) 3D Map Compression Method and Apparatus, and 3D Map Decompression Method and Apparatus
WO2022252237A1 (zh) 3d地图的编解码方法及装置
US20240095265A1 (en) Method and apparatus for retrieving 3d map
CN116664684B (zh) 定位方法、电子设备及计算机可读存储介质
WO2022252238A1 (zh) 3d地图的压缩、解压缩方法和装置
CN116934884A (zh) 建筑物纹理生成方法及装置
CN116664812A (zh) 一种视觉定位方法、视觉定位系统及电子设备
CN117152338A (zh) 一种建模方法与电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874769

Country of ref document: EP

Kind code of ref document: A1