WO2023131090A1 - 一种增强现实系统、多设备构建三维地图的方法及设备 - Google Patents

一种增强现实系统、多设备构建三维地图的方法及设备 Download PDF

Info

Publication number
WO2023131090A1
WO2023131090A1 PCT/CN2022/144274 CN2022144274W WO2023131090A1 WO 2023131090 A1 WO2023131090 A1 WO 2023131090A1 CN 2022144274 W CN2022144274 W CN 2022144274W WO 2023131090 A1 WO2023131090 A1 WO 2023131090A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
image
environment
frame
initial
Prior art date
Application number
PCT/CN2022/144274
Other languages
English (en)
French (fr)
Inventor
温裕祥
何凯文
李江伟
郑亚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023131090A1 publication Critical patent/WO2023131090A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present application relates to the field of augmented reality technology, in particular to an augmented reality system, a method and a device for constructing a three-dimensional map with multiple devices.
  • Augmented reality is a technology that integrates real world information and virtual world information.
  • AR technology can simulate physical information that is difficult to experience in the real world to obtain virtual information, and apply the virtual information to the real world, so that the real environment and virtual objects can be superimposed on the same screen or space in real time and be perceived by the user at the same time. To achieve a sensory experience beyond reality.
  • the user can simultaneously observe the real world and virtual items in the AR scene displayed on the display screen of the electronic device, such as the ground, table and other items in the real world observed by the user on the display screen of the electronic device, and can also observe the ground Virtual items such as anime characters placed on the display screen of the electronic device.
  • the three-dimensional map of the AR scene can be used to represent the environmental information of the real world, and then the electronic device can construct the AR scene based on the three-dimensional map. For example, when the user captures the real world in real time through the camera device of the electronic device, the user can add virtual items to the electronic scene displayed on the display screen of the electronic device, and the electronic device adds the virtual object to the AR scene based on the 3D map corresponding to the real world In , the user can observe the real world and the virtual items added by the user in the same screen.
  • the present application provides an augmented reality system, a method and a device for constructing a three-dimensional map with multiple devices.
  • the present application provides an augmented reality system, which includes a first electronic device, a second electronic device, and a distributed mapping system;
  • the first electronic device is configured to send multiple frames of the first environment image and pose information of each frame of the first environment image to the distributed mapping system;
  • the first environment image is a pair of the first electronic device
  • the position and orientation information of each frame of the first environment image obtained from shooting the environment is used to represent the position and orientation of the first electronic device in the three-dimensional coordinate system corresponding to the first electronic device when the first environment image is captured ;
  • the second electronic device is configured to send multiple frames of second environment images and initial pose information of each frame of second environment images to the distributed mapping system, the second environment images being the second electronic device
  • the initial pose information of each frame of the second environment image obtained by photographing the environment is used to represent the position of the second electronic device in the three-dimensional coordinate system corresponding to the second electronic device when the second environment image is captured and orientation;
  • the distributed mapping system is configured to receive the multiple frames of the first environment image and the pose information of each frame of the first environment image sent by the first electronic device; receive the information sent by the second electronic device Multi-frame second environment images and the initial pose information of each frame of the second environment image; perform pose conversion on the initial pose information of the multi-frame second environment images according to the target transformation relationship, to obtain the second environment image of each frame Target pose information, where the target pose information is used to represent the position and orientation of the second electronic device in the three-dimensional coordinate system corresponding to the first electronic device when shooting the second environment image; the target conversion relationship is the conversion relationship between the three-dimensional coordinate system corresponding to the second electronic device and the three-dimensional coordinate system corresponding to the first electronic device; according to the pose information of the multiple frames of the first environment image and each frame of the first environment image , the multiple frames of the second environment images and the target pose information of each frame of the second environment images create a three-dimensional map, and the three-dimensional map is used to construct an augmented reality scene.
  • the distributed mapping system can convert the poses of the environmental images uploaded by different electronic devices into the poses in the same three-dimensional coordinate system.
  • the graph system can generate a 3D map based on the environment image after the pose conversion, and then realize the construction of a 3D map by multiple devices, improve the efficiency of map construction, and at the same time enhance the interactivity between multiple devices and improve the user experience.
  • the first electronic device is further configured to: before sending the multiple frames of the first environment image and the pose information of each frame of the first environment image to the distributed mapping system, send The distributed mapping system sends a multi-device mapping request, and the multi-device mapping request includes multiple frames of the first initial image and positioning information of each frame of the first initial image; the multiple frames of the first initial image are the Obtained by taking pictures of the surrounding environment by the first electronic device;
  • the second electronic device is further configured to: before sending the multiple frames of the second environment image and the initial pose information of each frame of the second environment image to the distributed mapping system, to the distributed mapping system Sending a request for joining in mapping, where the request for joining in mapping includes multiple frames of second initial images; the multiple frames of second initial images are obtained by taking pictures of the environment where the second electronic device is located;
  • the distributed mapping system is further configured to: receive the multi-device mapping request sent by the first electronic device, and according to the multi-frame first initial image and each frame of the first initial image in the multi-device mapping request, Generate an initial three-dimensional map based on the positioning information of the image; receive a mapping request sent by the second electronic device, and determine the target conversion relationship according to the multiple frames of the second initial image and the initial three-dimensional map.
  • the distributed mapping system can receive the multi-device mapping request sent by the first electronic device, and generate an initial three-dimensional map according to the multi-frame first initial images in the multi-device mapping request, and the initial three-dimensional map can be used for Locating the second electronic device may also be used to determine the target conversion relationship.
  • the distributed mapping system can receive the mapping request sent by the second electronic device, and determine the target conversion relationship according to the multiple frames of the second initial image and the initial three-dimensional map in the mapping request, and the target conversion relationship can be used to convert the second
  • the pose information of the image collected by the electronic device is converted to the three-dimensional coordinate system corresponding to the first electronic device, so that the distributed mapping system can create a three-dimensional map based on the environment image in the unified coordinate system.
  • the distributed mapping system is further configured to: after receiving the multiple frames of the second initial image, determine the target conversion relationship according to the multiple frames of the second initial image and the initial three-dimensional map Before, image processing is performed on the multiple frames of the second initial images, and it is determined that at least one frame of the second initial image contains the same image content as any frame of the first initial image.
  • the second electronic device needs to re-scan the area scanned by the first electronic device to collect multiple frames of the second initial image, and at least one frame of the second initial image includes any frame of the image contained in the first initial image content, the distributed mapping system can confirm that the second electronic device joins the multi-device mapping task, and can further determine the target conversion relationship based on the second initial image and the first initial image.
  • the distributed mapping system is specifically used to: extract the global features and feature points of the target initial image, where the target initial image is the second initial image of any frame; according to the global features of the target initial image Determine at least one frame of the first initial image matching the target initial image, and determine the three-dimensional point corresponding to the feature point in the three-dimensional map in the at least one frame of the first initial image matching the target initial image, and use the determined three-dimensional point as The three-dimensional points corresponding to the feature points of the target initial image; determining the target pose information of the target initial image according to the feature points of the target initial image, the three-dimensional points corresponding to the feature points of the target initial image, and the camera internal reference of the second electronic device; The initial pose information of the target initial image and the target pose information of the target initial image determine the target conversion relationship.
  • the distributed mapping system determines the target conversion relationship, it can determine the 3D points in the 3D map corresponding to the feature points of any frame of the second initial image, and then according to the feature points of the second initial image, the corresponding feature points
  • the 3D point and the internal parameters of the camera of the second electronic device determine the target pose information
  • the target pose information is the pose information of the second electronic device in the 3D coordinate system corresponding to the initial 3D map when the second initial image is taken
  • the initial The three-dimensional coordinate system of the three-dimensional map is the same as the three-dimensional coordinate system corresponding to the first electronic device
  • the initial pose information is the pose information in the three-dimensional coordinate system corresponding to the second electronic device, so that the target pose information and the initial pose information can be An accurate target conversion relationship between the coordinate system corresponding to the second electronic device and the coordinate system corresponding to the first electronic device is determined.
  • the distributed mapping system is further configured to: generate point cloud resources according to the 3D points in the initial 3D map; receive a positioning request sent by the first electronic device, and the positioning request includes The environmental image collected by the first electronic device; positioning the first electronic device according to the environmental image in the positioning request and the initial three-dimensional map, and determining that the first electronic device is on the initial three-dimensional map The pose in the three-dimensional coordinate system of the first electronic device; sending the pose of the first electronic device in the three-dimensional coordinate system of the initial three-dimensional map and the point cloud resource to the first electronic device;
  • the first electronic device is further configured to: send a positioning request to the distributed mapping system; receive the position of the first electronic device in the three-dimensional coordinate system of the initial three-dimensional map sent by the distributed mapping system Pose and the point cloud resource; displaying the environment image collected by the first electronic device in real time and the point cloud resource according to the pose of the first electronic device in the three-dimensional coordinate system of the initial three-dimensional map, to Indicates that the area covered by the point cloud resource has been scanned.
  • the distributed mapping system can generate point cloud resources according to the 3D points in the 3D map and send them to the first electronic device, and the first electronic device can display the real-time collected environmental images and point cloud resources to represent the first electronic
  • the area that the device has already scanned guides the user to continue scanning other areas to improve user experience.
  • the distributed mapping system is specifically configured to: select a frame of image to be processed from the multiple frames of the first environment image and the multiple frames of the second environment image as the target image, and The target image is subjected to a target processing process until the multiple frames of the first environmental image and the multiple frames of the second environmental image have been subjected to the target processing process;
  • the target processing process includes the following steps: extracting the target image the first feature point; obtain the feature point of at least one frame of image that has been subjected to the target processing process; select at least one second feature point from the feature point of the at least one frame of image to form a feature with the first feature point A matching pair; wherein, the first feature point and the at least one second feature point correspond to the same point in the environment;
  • the at least one frame of image that has undergone the target processing process includes at least one frame of the first environment image and/or at least one frame of the second environment image; acquiring multiple feature matching pairs obtained after performing target processing on the multiple frames of the first environment image and the multiple frames of the second environment
  • the distributed mapping system when the distributed mapping system creates a 3D map, it can extract the feature points of each frame of the environment image, and determine the feature matching pairs with the feature points of other images that have completed image processing.
  • the feature matching pairs are The feature points in different environmental images corresponding to the same 3D point can determine the 3D point according to the feature matching pair, and then create a 3D map, so that the 3D map is more in line with the real environment.
  • the distributed mapping system is specifically configured to: determine the multiple The three-dimensional map is obtained by matching a plurality of three-dimensional points corresponding to the three-dimensional coordinate system corresponding to the first electronic device.
  • the distributed mapping system can determine the 3D point corresponding to the feature matching pair according to the feature matching pair, the pose information of multiple frames of the first environment image, and the target pose information of multiple frames of the second environment image.
  • the method is based on the feature matching of feature points in different environmental images to match the same three-dimensional point in the corresponding three-dimensional space to realize the location of the three-dimensional point, and then determine the three-dimensional point, ensuring that the three-dimensional map is based on the first electronic device and The environment image actually collected by the second electronic device is generated, which can accurately represent the features in the real environment.
  • the first electronic device is further configured to: send positioning information corresponding to each frame of the first environment image to the distributed mapping system;
  • the second electronic device is further configured to: send positioning information corresponding to each frame of the second environment image to the distributed mapping system;
  • the distributed mapping system is further configured to: receive positioning information corresponding to each frame of the first environment image sent by the first electronic device; receive positioning information corresponding to each frame of the second environment image sent by the second electronic device ; Adjusting the coordinates of the three-dimensional points according to the positioning information corresponding to each frame of the first environment image and the positioning information corresponding to each frame of the second environment image, to obtain a three-dimensional map with the same proportion as the real environment.
  • the distributed mapping system can adjust the coordinates of the 3D points in the 3D map based on the positioning information of the first environment image and the positioning information of the second environment image, and obtain a 3D map with the same proportion as the real environment.
  • the virtual world can be displayed in fusion with the real environment to improve the user experience.
  • the first electronic device is further configured to: collect a first depth map corresponding to the multiple frames of the first environment image, and send the first depth map to the distributed mapping system;
  • the second electronic device is further configured to: collect a second depth map corresponding to the multiple frames of the first environment image, and send the second depth map to the distributed mapping system;
  • the distributed mapping system is further configured to: receive the first depth map sent by the first electronic device; receive the second depth map sent by the second electronic device; performing coordinate conversion on the second depth map, and performing fusion processing on the first depth map and the second depth map after coordinate conversion to obtain a complete depth map; generating a white film corresponding to the real environment according to the complete depth map; wherein , the white film is used to represent the surface of each object in the real environment.
  • the distributed mapping system is further configured to: determine the depth information of each frame of the first environment image and the depth information of each frame of the second environment image based on a multi-view stereo matching algorithm, according to each frame of the first The depth information of the environment image and the depth information of each frame of the second environment image generate a white film corresponding to the real environment; wherein the white film is used to represent the surface of each object in the real environment.
  • this application provides multiple ways to determine the depth information of the environment image.
  • the electronic device can determine the depth map corresponding to the environment image and send it to the distributed mapping system.
  • the distributed mapping system can The transformation relationship converts the coordinates of the depth map; in another way, the distributed mapping system can determine the depth information of each frame of the environment image based on the multi-view stereo matching algorithm.
  • the distributed mapping system can determine the white film corresponding to the real environment according to the depth map or depth information corresponding to each frame of the environment image.
  • the white film can represent the surface of each object in the real environment, so that when the electronic device displays the augmented reality scene to the user,
  • the electronic device can place the 3D digital resource model on the object surface of the real environment according to the white film, so as to improve the authenticity of the augmented reality scene and enhance the user experience.
  • the distributed mapping system is further configured to: send the three-dimensional map to the first electronic device and the second electronic device; receive a positioning request sent by the first electronic device , determine the first pose of the first electronic device in the three-dimensional coordinate system of the three-dimensional map, send the first pose to the first electronic device and the second electronic device; receive the The positioning request sent by the second electronic device determines the second pose of the second electronic device in the three-dimensional coordinate system of the three-dimensional map, and sends the second pose to the first electronic device and the second electronic device;
  • the first electronic device is further configured to: receive the three-dimensional map sent by the distributed mapping system; send a positioning request to the distributed mapping system, and receive the first map sent by the distributed mapping system.
  • a pose and the second pose displaying the augmented reality scene according to the first pose and the three-dimensional map, and displaying the photographed image of the user using the second electronic device according to the second pose
  • the first electronic device when multiple users operate multiple electronic devices to display the target scene, they can play interactively. For example, if the first user operates the first electronic device, and the second user operates the second electronic device, if the second user enters the shooting range of the first electronic device with the second electronic device, the first electronic device can display the real-time photographed first electronic device. The image of the second user is displayed, and the three-dimensional digital resource model corresponding to the second electronic device is displayed, so as to realize the interactive play between the devices, improve the interaction between users in the augmented reality scene, and enhance the user experience.
  • the present application provides a method for constructing a three-dimensional map with multiple devices, which is applied to an augmented reality system.
  • the augmented reality system includes a first electronic device, a second electronic device, and a server.
  • the method includes:
  • the first electronic device sends multiple frames of the first environment image and the pose information of each frame of the first environment image to the distributed mapping system;
  • the pose information of each frame of the first environment image obtained by shooting is used to represent the position and orientation of the first electronic device in the three-dimensional coordinate system corresponding to the first electronic device when the first environment image is taken;
  • the second electronic device sends multiple frames of second environment images and the initial pose information of each frame of the second environment image to the distributed mapping system, and the second environment image is taken by the second electronic device of the environment in which it is located
  • the obtained initial pose information of each frame of the second environment image is used to represent the position and orientation of the second electronic device in the three-dimensional coordinate system corresponding to the second electronic device when taking the second environment image;
  • the The distributed mapping system performs pose conversion on the initial pose information of the multiple frames of the second environment image according to the target conversion relationship to obtain the target pose information of each frame of the second environment image, and the target pose information is used to represent The position and orientation of the second electronic device in the three-dimensional coordinate
  • the method further includes: before the first electronic device sends the multiple frames of the first environment image and the pose information of each frame of the first environment image to the distributed mapping system, Sending a multi-device mapping request to the distributed mapping system, the multi-device mapping request includes multiple frames of the first initial image and positioning information of each frame of the first initial image; the multiple frames of the first initial image are The first electronic device captures the environment where it is located; the second electronic device sends the multiple frames of the second environment image and the initial pose information of each frame of the second environment image to the distributed mapping system Before, a request for joining in mapping is sent to the distributed mapping system, and the request for joining in mapping includes multiple frames of second initial images; the multiple frames of second initial images are the environment where the second electronic device is located.
  • the distributed mapping system receives the multi-device mapping request sent by the first electronic device, and according to the multi-frame first initial image in the multi-device mapping request and the first initial image of each frame The positioning information of the image generates an initial three-dimensional map; the distributed mapping system receives the request for joining the mapping sent by the second electronic device, and determines the target conversion according to the multiple frames of the second initial image and the initial three-dimensional map relation.
  • the The method further includes: the distributed mapping system performs image processing on the multiple frames of the second initial image, and determines that at least one frame of the second initial image contains the same image content as any frame of the first initial image.
  • the distributed mapping system determines the target conversion relationship according to the multi-frame second initial image and the initial three-dimensional map, including: the distributed mapping system extracts the target initial image Global features and feature points, the target initial image is any frame of the second initial image; the distributed mapping system determines at least one frame of the first initial image that matches the target initial image according to the global features of the target initial image, and Determining a three-dimensional point corresponding to a feature point in at least one frame of the first initial image matching the target initial image in the three-dimensional map, and using the determined three-dimensional point as a three-dimensional point corresponding to a feature point of the target initial image; the distributed construction The graph system determines the target pose information of the target initial image according to the feature points of the target initial image, the three-dimensional points corresponding to the feature points of the target initial image, and the camera internal reference of the second electronic device; the distributed mapping system according to the The target conversion relationship is determined by the initial pose information of the target initial image and the target pose information of the target initial image.
  • the method further includes: the distributed mapping system generates point cloud resources according to the 3D points in the initial 3D map; the distributed mapping system receives the resource sent by the first electronic device A positioning request, the positioning request including the environment image collected by the first electronic device; positioning the first electronic device according to the environment image in the positioning request and the initial three-dimensional map, and determining the second The pose of an electronic device in the three-dimensional coordinate system of the initial three-dimensional map; the distributed mapping system sends the first electronic device to the first electronic device in the three-dimensional coordinate system of the initial three-dimensional map pose and the point cloud resource; the first electronic device displays the real-time collected environment image and The point cloud resource indicates that the area covered by the point cloud resource has been scanned.
  • the distributed mapping system is based on the multiple frames of the first environment image, the pose information of each frame of the first environment image, the multiple frames of the second environment image, and each frame of the second environment image Create a three-dimensional map based on the target pose information, including: the distributed mapping system selects a frame of image to be processed from the multiple frames of the first environment image and the multiple frames of the second environment image as the target image, and The target image is subjected to a target processing process until the multiple frames of the first environment image and the multiple frames of the second environment image have been subjected to the target processing process;
  • the target processing process includes the following steps: extracting the first feature point of the target image; acquiring feature points of at least one frame of image that has undergone the target processing process; selecting at least one feature point from the feature points of the at least one frame of image A second feature point and the first feature point form a feature matching pair; wherein, the first feature point and the at least one second feature point correspond to the same point in the environment; the target has been carried out
  • the at least one frame of image in the processing process includes at least one frame of the first environment image and/or at least one frame of the second environment image;
  • the distributed mapping system acquires a plurality of feature matching pairs obtained after performing target processing on the multiple frames of the first environment image and the multiple frames of the second environment image, and creates the multiple feature matching pairs according to the multiple feature matching pairs. 3D map.
  • the distributed mapping system creates the three-dimensional map according to the multiple feature matching pairs, including: the distributed mapping system according to the pose information of the multiple frames of the first environment image and the target pose information of the plurality of frames of the second environment images, determine that the plurality of features match a plurality of corresponding three-dimensional points in the three-dimensional coordinate system corresponding to the first electronic device, and obtain the three-dimensional map.
  • the method further includes: the first electronic device sending positioning information corresponding to each frame of the first environment image to the distributed mapping system;
  • the mapping system sends the positioning information corresponding to each frame of the second environment image;
  • the distributed mapping system calculates the position of the three-dimensional point according to the positioning information corresponding to each frame of the first environment image and the positioning information corresponding to each frame of the second environment image.
  • the coordinates are adjusted to obtain a three-dimensional map with the same proportion as the real environment.
  • the method further includes: the first electronic device collects a first depth map corresponding to the multiple frames of the first environment image, and sends the first depth map to the distributed mapping system; the second electronic device collects a second depth map corresponding to the multiple frames of the first environment image, and sends the second depth map to the distributed mapping system; the distributed mapping system according to the The target conversion relationship is used to perform coordinate conversion on the second depth map, and to perform fusion processing on the first depth map and the coordinate-transformed second depth map to obtain a complete depth map; the distributed mapping system according to the The complete depth map generates a white film corresponding to the real environment; wherein the white film is used to represent the surface of each object in the real environment.
  • the method further includes: the distributed mapping system determines the depth information of each frame of the first environment image and the depth information of each frame of the second environment image based on a multi-view stereo matching algorithm, and according to each frame The depth information of the first environment image and the depth information of each frame of the second environment image generate a white film corresponding to the real environment; wherein the white film is used to represent the surface of each object in the real environment.
  • the method further includes: the distributed mapping system sending the three-dimensional map to the first electronic device and the second electronic device; the distributed mapping system receiving the three-dimensional map the positioning request sent by the first electronic device, determine the first pose of the first electronic device in the three-dimensional coordinate system of the three-dimensional map, and send the first pose to the first electronic device and the the second electronic device; the distributed mapping system receives the positioning request sent by the second electronic device, determines the second pose of the second electronic device in the three-dimensional coordinate system of the three-dimensional map, and The second pose is sent to the first electronic device and the second electronic device; the first electronic device displays an augmented reality scene according to the first pose and the three-dimensional map, and according to the second The pose display captures the image of the user using the second electronic device and the three-dimensional digital resource model corresponding to the second electronic device.
  • the present application provides a method for constructing a three-dimensional map with multiple devices, which is applied to a distributed mapping system, and the method includes:
  • the pose information of the environment image is used to represent the position and orientation of the first electronic device in the three-dimensional coordinate system corresponding to the first electronic device when the first environment image is taken;
  • the target conversion relationship is the conversion relationship between the three-dimensional coordinate system corresponding to the second electronic device and the three-dimensional coordinate system corresponding to the first electronic device; according to the multi-frame first The environment image, the pose
  • the method further includes: before receiving the multiple frames of the first environment image and the pose information of each frame of the first environment image sent by the first electronic device, receiving the first electronic device A multi-device mapping request sent by a device, the multi-device mapping request includes multiple frames of the first initial image and positioning information of each frame of the first initial image, and the multiple frames of the first initial image are the first electronic device Obtained by photographing the environment; before receiving the multiple frames of second environment images sent by the second electronic device and the initial pose information of each frame of the second environment image, receiving the adding information sent by the second electronic device Mapping request, the adding to the mapping request includes multiple frames of the second initial image, the multiple frames of the second initial image are obtained by the second electronic device shooting the environment; according to the multi-device mapping request Generate an initial three-dimensional map from the multiple frames of the first initial image and the positioning information of each frame of the first initial image; determine the target conversion relationship according to the multiple frames of the second initial image and the initial three-dimensional map.
  • the method further includes: Image processing is performed on the plurality of frames of the second initial images to determine that at least one frame of the second initial images contains the same image content as any frame of the first initial images.
  • the determining the target conversion relationship according to the multiple frames of the second initial image and the initial three-dimensional map includes: extracting global features and feature points of the target initial image, the target initial image is Any frame of the second initial image; determining at least one frame of the first initial image matching the target initial image according to the global characteristics of the target initial image, and determining that the feature points in the at least one frame of the first initial image matching the target initial image are at For the corresponding three-dimensional points in the three-dimensional map, the determined three-dimensional points are used as the three-dimensional points corresponding to the feature points of the target initial image; according to the feature points of the target initial image, the three-dimensional points corresponding to the feature points of the target initial image, and the Determine the target pose information of the initial target image by camera internal reference; determine the target conversion relationship according to the initial pose information of the target initial image and the target pose information of the target initial image.
  • the method further includes: generating point cloud resources according to the 3D points in the initial 3D map; receiving a positioning request sent by the first electronic device, and the positioning request includes the first electronic device An environmental image collected by the device; positioning the first electronic device according to the environmental image in the positioning request and the initial three-dimensional map, and determining that the first electronic device is in the three-dimensional coordinate system of the initial three-dimensional map the pose of the first electronic device; send the pose of the first electronic device in the three-dimensional coordinate system of the initial three-dimensional map and the point cloud resources to the first electronic device, so that the first electronic device according to the first The pose of the electronic device in the three-dimensional coordinate system of the initial three-dimensional map displays the environment image and the point cloud resource collected by the first electronic device in real time, to indicate that the area covered by the point cloud resource has been scanned.
  • the multiple frames of the first environment image, the pose information of each frame of the first environment image, the multiple frames of the second environment image, and the target pose information of each frame of the second environment image Creating a three-dimensional map includes: selecting a frame of image to be processed from the multiple frames of the first environment image and the multiple frames of the second environment image as a target image, and performing target processing on the target image, to the The multiple frames of the first environment image and the multiple frames of the second environment image have all been subjected to the target processing process;
  • the target processing process includes the following steps: extracting the first feature point of the target image; acquiring feature points of at least one frame of image that has undergone the target processing process; selecting at least one feature point from the feature points of the at least one frame of image A second feature point and the first feature point form a feature matching pair; wherein, the first feature point and the at least one second feature point correspond to the same point in the environment; the target has been carried out
  • the at least one frame of image in the processing process includes at least one frame of the first environment image and/or at least one frame of the second environment image;
  • the creating the three-dimensional map according to the multiple feature matching pairs includes: according to the pose information of the multiple frames of the first environment image and the target positions of the multiple frames of the second environment image attitude information, determine the multiple feature matching pairs corresponding to multiple three-dimensional points in the three-dimensional coordinate system corresponding to the first electronic device, and obtain the three-dimensional map.
  • the method further includes: receiving positioning information corresponding to each frame of the first environment image sent by the first electronic device; receiving location information corresponding to each frame of the second environment image sent by the second electronic device Positioning information: adjust the coordinates of the 3D points according to the positioning information corresponding to each frame of the first environment image and the positioning information corresponding to each frame of the second environment image, to obtain a 3D map in proportion to the real environment.
  • the method further includes: receiving the first depth map corresponding to the multiple frames of the first environment image sent by the first electronic device; receiving the multiple frames sent by the second electronic device A second depth map corresponding to the first environment image; performing coordinate transformation on the second depth map according to the target transformation relationship, and performing fusion processing on the first depth map and the coordinate-transformed second depth map to obtain a complete Depth map; generate a white film corresponding to the real environment according to the complete depth map; wherein the white film is used to represent the surface of each object in the real environment.
  • the method further includes: determining the depth information of each frame of the first environment image and the depth information of each frame of the second environment image based on the multi-view stereo matching algorithm, according to the depth information of each frame of the first environment image A white film corresponding to the real environment is generated based on the depth information of each frame of the second environment image; wherein the white film is used to represent the surface of each object in the real environment.
  • the method further includes: sending the three-dimensional map to the first electronic device and the second electronic device; receiving a positioning request sent by the first electronic device, and determining the A first pose of an electronic device in the three-dimensional coordinate system of the three-dimensional map, sending the first pose to the first electronic device and the second electronic device; receiving the sender from the second electronic device a positioning request, determine a second pose of the second electronic device in the three-dimensional coordinate system of the three-dimensional map, and send the second pose to the first electronic device and the second electronic device, making the first electronic device display an augmented reality scene according to the first pose and the three-dimensional map, and display the captured image of the user using the second electronic device and the first pose according to the second pose 2.
  • a three-dimensional digital resource model corresponding to the electronic device.
  • the present application provides an electronic device, the electronic device includes a plurality of functional modules; the plurality of functional modules interact to realize any one of the above aspects and the first electronic device or the second electronic device in each embodiment.
  • the method implemented by the device can be implemented based on software, hardware or a combination of software and hardware, and the multiple functional modules can be combined or divided arbitrarily based on specific implementations.
  • the present application provides an electronic device, including at least one processor and at least one memory, where computer program instructions are stored in the at least one memory, and when the electronic device is running, the at least one processor executes any of the above-mentioned Aspects and methods executed by the first electronic device or the second electronic device in various implementation manners thereof.
  • the embodiment of the present application provides a distributed mapping system, the distributed mapping system includes multiple computing nodes, the multiple computing nodes process data in parallel/serially, and each computing node is used to execute The method executed by the distributed execution system in any one of the above aspects and its various implementation manners.
  • the present application also provides a computer program.
  • the computer program runs on a computer, the computer executes the first electronic device, the second electronic device, or the distributed program in any of the above-mentioned aspects and various implementations thereof.
  • the method implemented by the formula-based mapping system is not limited to:
  • the present application also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer executes any one of the above aspects and its A method executed by the first electronic device, the second electronic device or the distributed mapping system in each implementation manner.
  • the present application also provides a chip, the chip is used to read the computer program stored in the memory, and execute the first electronic device, the second electronic device, or the distributed building block in any one of the above-mentioned aspects and its various implementation manners.
  • the graph system executes the method.
  • the present application also provides a chip system, which includes a processor, and is used to support a computer device to realize any one of the above aspects and the first electronic device, the second electronic device or the distributed mapping in each implementation manner.
  • the method that the system executes the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of an AR scene provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an augmented reality system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 4 is a software structural block diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a distributed mapping system provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a user-operated electronic device triggering a mapping initialization instruction provided in an embodiment of the present application
  • FIG. 7 is a schematic diagram of a scanning interface displayed by a first electronic device provided in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of initialization of a multi-person mapping provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a multi-person mapping interface provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of multiple users holding electronic devices to scan and capture the current environment in a multi-person construction provided by an embodiment of the present application;
  • FIG. 11 is a schematic diagram of a display interface of an electronic device when scanning and photographing the current environment provided by the embodiment of the present application;
  • Fig. 12 is a schematic diagram of a grid provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of a scanning progress interface displayed by a first electronic device according to an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a three-dimensional map provided by an embodiment of the present application.
  • FIG. 15 is a schematic flowchart of a method for constructing a three-dimensional map with multiple devices provided in an embodiment of the present application
  • Fig. 16a is a schematic diagram of an AR scene interface including material of a three-dimensional digital resource model provided by an embodiment of the present application;
  • FIG. 16b is a schematic diagram of an AR scene interface after adding a three-dimensional digital resource model provided by the embodiment of the present application;
  • FIG. 17 is a schematic diagram of an electronic device displaying interactive play by multiple users in an AR scene provided by an embodiment of the present application.
  • FIG. 18 is a schematic diagram of a created three-dimensional map list displayed by an electronic device provided in an embodiment of the present application.
  • FIG. 19 is a schematic diagram of a user-triggered extended three-dimensional map provided by an embodiment of the present application.
  • FIG. 20 is a flowchart of a method for creating a three-dimensional map with multiple devices according to an embodiment of the present application.
  • At least one in the embodiments of the present application refers to one or more, and “multiple” refers to two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one (item) of the following” or similar expressions refer to any combination of these items, including any combination of single item(s) or plural item(s).
  • At least one item (unit) of a, b or c can represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b, c Can be single or multiple.
  • Augmented reality is a technology that integrates real world information and virtual world information.
  • AR technology can simulate physical information that is difficult to experience in the real world to obtain virtual information, and apply the virtual information to the real world, so that the real environment and virtual objects can be superimposed on the same screen or space in real time and be perceived by the user at the same time. To achieve a sensory experience beyond reality.
  • the three-dimensional map of the AR scene can be used to represent the environmental information of the real world, and then the electronic device can construct the AR scene based on the three-dimensional map. For example, when the user captures the real world in real time through the camera device of the electronic device, the user can add virtual items to the electronic scene displayed on the display screen of the electronic device, and the electronic device adds the virtual object to the AR scene based on the 3D map corresponding to the real world In , the user can observe the real world and the virtual items added by the user in the same screen.
  • FIG. 1 is a schematic diagram of an AR scene provided by an embodiment of the present application. Referring to Fig. 1, the ground and the road in Fig. 1 are real-world pictures actually captured by the camera device of the electronic device, and the cartoon characters on the road are virtual items added by the user in the current AR scene. Real-world ground, roads and virtual cartoon characters are simultaneously observed on the display screen.
  • multiple electronic devices may jointly create a three-dimensional map.
  • each electronic device can collect environmental images for the same area in the real world, and generate a 3D point cloud corresponding to the area.
  • the 3D point cloud generated by the first electronic device is the target point cloud
  • the 3D point cloud generated by the second electronic device is the reference point cloud.
  • the first electronic device may pre-register the target point cloud and the reference point cloud based on the principal direction fitting method, calculate the curvature of each point in the target point cloud and the reference point cloud, and obtain feature matching point pairs according to the curvature similarity. Use feature matching point pairs to achieve accurate registration of target point cloud and reference point cloud, and then generate a 3D map.
  • the 3D point clouds obtained from environmental images collected by different electronic devices can be combined to realize the joint creation of 3D maps by multiple electronic devices, but this method requires a high overlap rate of environmental images collected by multiple electronic devices, and then This leads to low efficiency in generating 3D maps.
  • an embodiment of the present application provides a method for creating a 3D map, which is used to provide an efficient method for collaboratively creating a 3D map with multiple devices.
  • Fig. 2 is a schematic diagram of an augmented reality system provided by an embodiment of the present application.
  • the augmented reality system includes a first electronic device, at least one second electronic device and a distributed mapping system.
  • the first electronic device is a master device
  • the second electronic device is a slave device.
  • the distributed mapping system can be deployed in at least one cloud server.
  • a first electronic device, a second electronic device, and a distributed mapping series deployed in a cloud server are used as examples in FIG. 2 .
  • the first electronic device can start the multi-device map building task, and the second electronic device can join the multi-device map building task.
  • the first electronic device and the second electronic device can simultaneously collect real-world environmental images, the first electronic device uploads the collected multiple frames of first environmental images to the distributed mapping system, and the second electronic device uploads the collected Multiple frames of the second environment images are uploaded to the distributed mapping system.
  • the distributed mapping system can determine the target conversion relationship corresponding to multiple frames of the second environment image, and the target conversion relationship represents the conversion relationship between the three-dimensional coordinate system corresponding to the second environment image and the three-dimensional coordinate system corresponding to the first environment image.
  • the distributed mapping system performs pose conversion on the pose information of the second environment image according to the object transformation relationship to obtain the object pose information of the second environment image.
  • the distributed system can generate a three-dimensional map according to the first environment image and the pose-converted second environment image, and the three-dimensional map can be used to construct an AR scene.
  • the distributed mapping system can convert the poses of the environmental images uploaded by different electronic devices into the poses in the same three-dimensional coordinate system.
  • the distributed mapping system A three-dimensional map can be generated based on the converted environment image, and then multi-device construction of a three-dimensional map can be realized.
  • the electronic device of the embodiment of the present application may have a camera and a display device, for example, the electronic device of the embodiment of the present application may be a tablet computer, a mobile phone, a vehicle-mounted device, an augmented reality (augmented reality, AR) device, a notebook computer, a super mobile personal computer (ultra-mobile personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA), wearable devices, etc., the embodiments of the present application do not impose any restrictions on the specific types of electronic devices.
  • FIG. 3 is a schematic structural diagram of an electronic device 100 provided in an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , a display screen 194, and a subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors. Wherein, the controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the USB interface 130 is an interface conforming to the USB standard specification, specifically, it can be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100 , and can also be used to transmit data between the electronic device 100 and peripheral devices.
  • the charging management module 140 is configured to receive charging input from the charger.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves and radiate them through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite, etc. applied on the electronic device 100.
  • System global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication, NFC
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • code division multiple access code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the display screen 194 is used for displaying a display interface of an application, for example, displaying a display page of an application installed on the electronic device 100 .
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED active matrix organic light emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed
  • quantum dot light emitting diodes quantum do
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the display screen 194 may be used to display an AR scene, and the AR scene displayed on the display screen 194 may include images captured by the camera 193 in real time and virtual items placed by the user in the AR scene.
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the camera 193 can collect images for building a three-dimensional map of the AR scene, and the camera 193 can also be used to shoot panoramic images. A panorama corresponding to the location of the electronic device 100 .
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 .
  • the internal memory 121 may include an area for storing programs and an area for storing data. Wherein, the storage program area can store an operating system, software codes of at least one application program, and the like.
  • the data storage area can store data generated during use of the electronic device 100 (such as captured images, recorded videos, etc.) and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, save pictures, videos and other files in the external memory card.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the sensor module 180 may include a pressure sensor 180A, an acceleration sensor 180B, a touch sensor 180C and the like.
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 180A may be disposed on display screen 194 .
  • the touch sensor 180C is also called “touch panel”.
  • the touch sensor 180C can be disposed on the display screen 194, and the touch sensor 180C and the display screen 194 form a touch screen, also called “touch screen”.
  • the touch sensor 180C is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180C may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
  • the keys 190 include a power key, a volume key and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 may receive key input and generate key signal input related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate a vibrating reminder.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as taking pictures, playing audio, etc.) may correspond to different vibration feedback effects.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 195 is used for connecting a SIM card. The SIM card can be connected and separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • FIG. 3 do not constitute a specific limitation on the electronic device 100, and the electronic device may also include more or fewer components than shown in the figure, or combine some components, or split some components , or different component arrangements.
  • the combination/connection relationship between the components in FIG. 3 can also be adjusted and modified.
  • FIG. 4 is a software structural block diagram of an electronic device provided by an embodiment of the present application.
  • the software structure of the electronic device may be a layered architecture, for example, the software may be divided into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the operating system is divided into four layers, which are application program layer, application program framework layer (framework, FWK), runtime (runtime) and system library, and kernel layer from top to bottom.
  • the application layer can include a series of application packages. As shown in FIG. 4 , the application layer may include camera, setting, skin module, user interface (user interface, UI), three-party application program, and the like. Among them, the three-party application may include gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer can include some predefined functions. As shown in Figure 4, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications. Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of electronic devices. For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the runtime includes the core library and virtual machine.
  • the runtime is responsible for the scheduling and management of the operating system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of the operating system.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (media libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the hardware layer may include various types of sensors, such as acceleration sensors, gyroscope sensors, touch sensors, and the like.
  • FIG. 5 is a schematic structural diagram of a distributed mapping system provided by an embodiment of the present application.
  • the distributed mapping system in the embodiment of the present application may include multiple computing nodes (computing nodes 1 to computing nodes N as shown in FIG. 5, where N is a positive integer), at least one storage node, a task queue node, At least one scheduling node and positioning node.
  • the distributed mapping system can be deployed in the cloud, for example, multiple nodes of the distributed mapping system can be deployed in one cloud server, or multiple nodes can be deployed in multiple cloud servers.
  • the functions of each node in the distributed mapping system shown in Figure 5 are introduced below:
  • the computing node is configured to create a three-dimensional map corresponding to the environment based on a distributed processing method based on multiple frames of images uploaded by multiple electronic devices and the positioning parameters corresponding to each frame of the image.
  • different computing nodes can perform different processing tasks in the process of creating a 3D map, and N computing nodes jointly complete the entire process of creating a 3D map.
  • different computing nodes can perform the same type of processing on different images, so that the processing tasks of multiple frames of images are distributed to multiple computing nodes to perform synchronously, thereby speeding up image processing.
  • N computing nodes may include CPU algorithm components and GPU algorithm components shown in FIG. 5 .
  • CPU algorithm components there may be multiple CPU algorithm components in the distributed mapping system, and there may be multiple GPU algorithm components.
  • the GPU algorithm component can be used for image processing (such as feature extraction, matching, retrieval, etc.) Under the three-dimensional coordinate system, it can also be used to generate a three-dimensional map according to the image processing results of the GPU algorithm component.
  • the GPU algorithm component and the CPU algorithm component can queue the map construction instructions in the message middleware and perform algorithm automatic processing.
  • the computing node may also include a white model processing service, which is used to simplify the grid uploaded by the electronic device, and generate a white model according to the simplified grid.
  • the white film processing service can also generate a complete white film corresponding to the AR scene based on the depth map uploaded by multiple electronic devices.
  • the computing node may also be implemented by other types of algorithm processing components, which are not specifically limited in this embodiment of the present application.
  • the task queue node is used to cache the processing tasks in the process of creating the 3D map by queue.
  • Each computing node can read the tasks to be executed from the task queue node and perform corresponding processing, so as to realize the distributed and sequential execution of multi-processing tasks.
  • the task queue node can be implemented by using the queue-shaped message middleware shown in FIG. 5 .
  • the queue-shaped message middleware can be used to asynchronously cache 3D map creation instructions from multiple electronic devices, instructions for processing tasks in the 3D map creation process, etc., and can be shared or assigned to N computing nodes, so that N computing nodes Nodes share execution tasks and balance system load.
  • At least one storage node is used for temporarily or permanently storing data related to the three-dimensional map creation process.
  • at least one storage node may store multiple frames of images, intermediate data and result data processed by multiple computing nodes, and the like.
  • the storage nodes may include cloud databases, object storage services, elastic file services, cache-shaped message middleware, and the like.
  • the cloud database can be used to store user information on the electronic device side, instruction information on task processing during the process of creating a three-dimensional map, modification information on the three-dimensional map, and other serialized content that takes up a small storage space.
  • the object storage service can be used to store non-serialized content such as 3D models, high-definition pictures, videos, and animations involved in electronic devices that takes up a large storage space.
  • the elastic file service can be used to store map data of a 3D map generated by a 3D map creation algorithm, and data such as intermediate variables of an algorithm that takes up a large storage space.
  • Cache-shaped message middleware can be used for data such as intermediate variables that can be serialized and occupy less storage space during the processing of asynchronous cache algorithms, and can be shared with N computing nodes.
  • At least one scheduling node is used for overall management of the scheduling of some or all of the N computing nodes, task queue nodes, and at least one storage node.
  • the scheduling nodes in the distributed mapping system may include a cloud scheduling center and an algorithm scheduling center.
  • the cloud scheduling center can manage and schedule the algorithm scheduling center, storage nodes, task queue nodes and other nodes, and can interact with electronic devices for information and data, and can be used as an efficient message processing and distribution node, for example, the cloud scheduling center It can provide the upload address of multi-frame pictures to the electronic device, perform request scheduling on the electronic device side, request and return to the cloud database, etc.
  • the algorithm scheduling center is used to manage and schedule N computing nodes, and can also manage and schedule other algorithm services.
  • the positioning node is configured to locate the electronic device according to the image uploaded by the electronic device, so as to determine the relative position of the electronic device in the coordinate system of the three-dimensional map.
  • the positioning node may include a global visual positioning system (global visual positioning system, GVPS) service and a vector retrieval system (vector retrieval system, VRS) service.
  • GVPS global visual positioning system
  • VRS vector retrieval system
  • the GVPS service can be used for spatial positioning, and determine the 6-degree-of-freedom coordinates of the corresponding position of the current position of the electronic device in the created three-dimensional map.
  • the VRS service is used for vector searches.
  • GVPS service and VRS service can be used as sub-services of computing nodes.
  • the distributed mapping system shown in Figure 5 is only an exemplary description of the distributed mapping system provided by the embodiment of the present application, and is not applicable to the distributed mapping system provided by the embodiment of the present application.
  • the architecture of the system imposes limitations. Compared with the structure shown in FIG. 5 , the distributed mapping system to which the solution provided by the embodiment of the present application can add, delete or adjust some nodes, which is not specifically limited in the embodiment of the present application.
  • the execution process of the solution provided by the embodiment of the present application includes at least three stages of mapping initialization, data collection, and mapping. After the three-dimensional map is created based on the method of these three stages, stages such as positioning and adding digital resources can be further included. The method of each stage will be described in detail below.
  • the first electronic device may send a three-dimensional map building initialization instruction to the distributed mapping system.
  • the first user may operate the first electronic device to trigger the first electronic device to start creating a three-dimensional map, and the first electronic device sends a map creation initialization instruction to the scheduling node of the distributed mapping system.
  • the scheduling node of the distributed mapping system After the scheduling node of the distributed mapping system receives the mapping initialization instruction, it can assign a map identification (identity document, ID) to the electronic device for the 3D map to be created by the current mapping task.
  • the distributed mapping system can manage different 3D maps in a unified manner by assigning map IDs and instructing them to electronic devices, and can synchronize the information of 3D maps with electronic devices to avoid information inconsistencies between electronic devices and distributed mapping systems Cause problems in subsequent map processing or use.
  • FIG. 6 is a schematic diagram of a user operating an electronic device to trigger a mapping initialization instruction according to an embodiment of the present application.
  • the electronic device displays an initialization control interface on the display screen, which displays controls for triggering the mapping process, and may also display prompt information for indicating the way to trigger the mapping, such as "click the button to start recording" , the first user triggers the map building initialization instruction by clicking the control displayed on the display screen of the electronic device according to the prompt information.
  • the first electronic device After receiving the map identifier sent by the scheduling node, the first electronic device sends a multi-device map building request to the distributed map building system.
  • the multi-device mapping request includes multiple frames of the first initial image collected by the first electronic device and positioning information of each frame of the first initial image.
  • the first electronic device can take pictures of the current environment through the camera device, and send the multi-frame first initial images captured to the distributed mapping system, and the computing nodes of the distributed mapping system can use the multi-frame first
  • the initial image and the positioning information of the first initial image of each frame generate an initial three-dimensional map.
  • the initial three-dimensional map can be used to locate the first electronic device and the second electronic device.
  • the manner in which the distributed mapping system generates the initial three-dimensional map based on the multiple frames of the first initial image is the same as that of the distributed mapping system based on the multiple frames of the first environment image collected by the first electronic device and the multiple frames collected by the second electronic device.
  • the method of generating the target three-dimensional map from the frame of the second environment image is the same, which will not be described in detail here.
  • the distributed mapping system After the distributed mapping system generates the initial three-dimensional map, it can send the initial three-dimensional map to the first electronic device.
  • the first electronic device After the first electronic device receives the initial three-dimensional map, it can send a positioning request to the distributed mapping system.
  • the positioning request includes the environmental image collected by the camera device of the first electronic device, and the distributed mapping system can The environment image and the initial three-dimensional map in the request locate the first electronic device, and determine the pose of the first electronic device in the initial three-dimensional map.
  • the distributed mapping system returns the pose of the first electronic device in the initial three-dimensional map and point cloud resources corresponding to the environment image to the first electronic device.
  • the point cloud resources are three-dimensional points corresponding to feature points in the first initial images of multiple frames determined by the distributed mapping system during the process of generating the three-dimensional map.
  • the first electronic device may superimpose and display the point cloud resource on the display screen in the environment image captured in real time, so as to represent the currently scanned area of the first electronic device.
  • FIG. 7 is a schematic diagram of a scanning interface displayed by a first electronic device according to an embodiment of the present application.
  • the first electronic device After receiving the point cloud resource sent by the distributed mapping system, the first electronic device superimposes and displays the point cloud resource at a position corresponding to the point cloud resource in the image currently displayed on the display screen.
  • the first electronic device when the first electronic device displays an environment image superimposed with point cloud resources, the first electronic device may obtain a video stream including point cloud resources and the environment image, and send the video stream to the second electronic device. After receiving the video stream, the second electronic device may play the video corresponding to the video stream to the second user, where the second user is a user who operates the second electronic device. After viewing the video, the second user can know the area scanned by the first electronic device; or the second user can check the first electronic device of the first user to know the area scanned by the first electronic device.
  • the second electronic device may send a request for joining the mapping to the distributed mapping system, and the request for joining the mapping includes multiple frames of second initial images collected by the second electronic device and positioning information of each frame of the second initial image.
  • at least one frame of the second initial image contains the image content of any frame of the first initial image. That is to say, the second user may operate the second electronic device to capture images of the environment captured by the first electronic device, so as to join the multi-device mapping task.
  • the computing node can perform image processing on multiple frames of the second initial image, and determine that at least one frame of the second initial image and any frame of the first initial image contain same image content, the distributed mapping system sends the map identification to the second electronic device, and the second electronic device joins the mapping task of the first electronic device.
  • FIG. 8 is a schematic diagram of initialization of multi-person mapping provided by the embodiment of the present application.
  • the first electronic device collects multiple frames of the first initial image, for example, one frame of the first initial image is obtained by the first user operating the first electronic device to capture a target item in the current environment.
  • the second user may operate the second electronic device to photograph the target item to obtain a second initial image.
  • the first electronic device sends multiple frames of first initial images to the distributed mapping system, and the distributed mapping system generates an initial three-dimensional map according to the multiple frames of first initial images.
  • the second electronic device sends multiple frames of the second initial image to the distributed mapping system, and the distributed mapping system performs image processing on the multiple frames of the second initial image, and determines that the second initial image contains the target object, then the distributed mapping system The system sends the map identification to the second electronic device.
  • the distributed mapping system can determine multiple frames of the second initial image relative to the target pose information of the initial three-dimensional map, and determine the three-dimensional coordinate system corresponding to the second electronic device and the initial three-dimensional map according to the initial pose information and the target pose information of the multiple frames of the second initial image
  • the three-dimensional coordinate system corresponding to the second electronic device may be the three-dimensional coordinate system created when the second electronic device runs SLAM to determine the pose information of the image; the three-dimensional coordinate system of the initial three-dimensional map is the three-dimensional coordinate system corresponding to the first electronic device, It may be a three-dimensional coordinate system created when the first electronic device runs SLAM to determine the pose information of the image.
  • the pose indicated by the initial pose information is the pose of the second electronic device in the three-dimensional coordinate system corresponding to the second electronic device when the second initial image is taken
  • the pose indicated by the target pose information is the pose of the second electronic device
  • the distributed mapping system can convert the pose of the image uploaded by the second electronic device into the pose in the three-dimensional coordinate system corresponding to the first electronic device according to the target conversion relationship, and then can convert the image uploaded by the second electronic device and the pose of the first electronic device
  • a three-dimensional map is constructed from images uploaded by an electronic device.
  • the computing nodes of the distributed mapping system perform image processing on the second initial image of each frame, extract feature vectors from the multi-scale gray features of each region in the image, obtain the local features of the image, and extract the feature points.
  • the feature vector can be used to represent the texture feature of the local area in the image.
  • the calculation node determines the feature points in the first initial image corresponding to the feature points in the second initial image, and determines the three-dimensional point corresponding to the feature point in the first initial image according to the initial three-dimensional map, then it can be determined that the three-dimensional point is the second initial The 3D points corresponding to the feature points in the image.
  • the calculation node may calculate the target pose information of the second initial image according to the feature points in the second initial image, the three-dimensional points corresponding to the feature points, and the internal camera parameters of the camera of the second electronic device.
  • the calculation node may determine the conversion relationship between the three-dimensional coordinate system corresponding to the second initial image and the coordinate system of the initial three-dimensional map according to the initial pose information of multiple frames of the second initial image and the target pose information.
  • the calculation node can determine the transformation matrix according to the initial pose information and the target pose information of the second initial image of each frame, and after determining multiple transformation matrices according to the second initial image of multiple frames, calculate the average of the multiple transformation matrices A target transformation matrix is obtained, where the target transformation matrix is a target transformation relationship between the three-dimensional coordinate system corresponding to the second initial image and the three-dimensional coordinate system of the initial three-dimensional map.
  • the second electronic device may also send a positioning request to the distributed mapping system, and the positioning request includes environment image
  • the distributed mapping system can perform pose transformation on the environment image according to the target transformation relationship, and locate the second electronic device according to the transformed environment image, and determine the position of the second electronic device in the initial 3D map pose.
  • the distributed mapping system returns to the second electronic device the pose of the second electronic device in the initial three-dimensional map and the point cloud resources corresponding to the environment image, wherein the point cloud resources are uploaded by the distributed mapping system to the second electronic device
  • the three-dimensional points corresponding to the feature points obtained after the feature extraction of the environment image.
  • the second electronic device can superimpose and display the point cloud resource on the display screen in the real-time captured environment image, so as to represent the currently scanned area.
  • the first electronic device and the second electronic device can display the multi-person mapping interface as shown in FIG. The user ID for the graph task.
  • the distributed mapping system can determine the conversion relationship between the three-dimensional coordinate system corresponding to the image collected by the second electronic device and the three-dimensional coordinate system corresponding to the image collected by the first electronic device, so that different The pose of the image collected by the electronic device is converted into the pose in the same three-dimensional coordinate system, so that the distributed mapping system can construct a three-dimensional map based on the images collected by multiple electronic devices.
  • the first electronic device can collect multiple frames of the first environment image of the environment and the positioning information corresponding to each frame of the first environment image, and upload the collected first environment image to the Distributed mapping system.
  • the second electronic device can also collect multiple frames of second environment images of the environment and positioning information corresponding to each frame of the second environment images, and upload the collected second environment images to the distributed mapping system.
  • the distributed mapping system can respectively perform image processing on the multiple frames of the first environment image uploaded by the first electronic device and the multiple frames of the second environment image uploaded by the second electronic device, and determine the characteristic information of each frame of the image, so as to further create the A 3D map of the environment.
  • This process mainly includes the following steps 1 to 4:
  • Step 1 The first electronic device and the second electronic device respectively scan and shoot the video of the environment where they are located.
  • the first electronic device and the second electronic device can respectively scan and photograph the environment where they are located, display the environmental image currently captured by the camera device on the display screen in real time, and display prompt information for prompting the user to continue scanning .
  • the user operates the electronic device to scan, he can move the electronic device to continuously scan the current environment according to the prompt information on the display screen, and the electronic device stops scanning after receiving the end scanning instruction triggered by the user.
  • FIG. 10 is a schematic diagram of multiple users holding electronic devices to scan and capture the current environment in a multi-person construction provided by an embodiment of the present application. Referring to FIG. 10 , two users can scan and shoot the current environment with different electronic devices.
  • FIG. 11 is a schematic diagram of a display interface of an electronic device when scanning and photographing the current environment provided by an embodiment of the present application.
  • the electronic device displays the environmental image captured by the camera device in real time on the display screen.
  • the user decides to stop scanning, he can click the control to trigger the stop scanning shown in FIG. 11 , then the electronic device can stop scanning, and according to The scanned content generates a video file.
  • the first electronic device and the second electronic device can scan and shoot different areas in the same environment, so as to realize multi-device collaborative collection of environmental images, increase the interaction between users and improve the accuracy of environmental images. collection efficiency.
  • Step 2 The first electronic device extracts multiple frames of the first environmental image that meet the key frame requirements from the captured video, and uploads the multiple frames of the first environmental image to the distributed mapping system; the second electronic device extracts the multiple frames of the first environmental image from the captured video Multiple frames of second environment images meeting the key frame requirements are extracted from the video, and the multiple frames of second environment images are uploaded to the distributed mapping system.
  • step 2 After the first electronic device and the second electronic device capture the video of the current environment, they extract the environmental image that meets the key frame requirements from the video, and upload the environmental image to the distributed mapping system .
  • the functions performed by the first electronic device and the second electronic device are the same.
  • the specific content of step 2 is introduced below taking the first electronic device as an example:
  • the first electronic device can acquire pose information of each frame of the first environment image by running a SLAM algorithm.
  • the pose information of each frame of the first environment image is the pose corresponding to the pose of the first electronic device in the three-dimensional coordinate system of the target three-dimensional map when shooting the frame of image, wherein the target three-dimensional map is a distributed mapping system according to the first A three-dimensional map generated from environmental images collected by the first electronic device and the second electronic device.
  • the 3D coordinate system of the target 3D map is the same as the 3D coordinate system of the initial 3D map generated by the distributed mapping system in the mapping initialization.
  • the first electronic device may select a first environment image from the video that meets the key frame requirements in any of the following ways:
  • the first electronic device obtains the pose information when the frame of image is collected, and compares the pose information with the pose information when the previous frame meets the requirements of the key frame. information for comparison. If it is determined that the offset between the poses indicated by the two pose information is greater than the set offset threshold, it is determined that the frame image is the first environment image that meets the key frame requirements; otherwise, it is determined that the frame image is not If the key frame requirement is met, continue to judge the image of the next frame until all the images in the video have been determined whether they meet the key frame requirement, so as to select the first environment image that meets the key frame requirement.
  • the first electronic device can extract the local features of each frame of images, and determine the feature points in each frame of images according to the extracted local features of each frame of images, and then use the optical flow tracking method to track the feature points, according to the The tracking situation of the feature points selects the image that meets the key frame requirements.
  • the optical flow tracking method can be used to determine whether the feature points in the current frame image exist in the next frame image, so based on the optical flow tracking method, the number of identical feature points contained in the two frames of images can be judged.
  • the first electronic device For each frame image in the video, after the first electronic device extracts the feature points in the frame image, determine the number of the same feature points contained in the frame image and the previous frame image that meets the key frame requirements, if the number is greater than the set The quantity threshold of the number or the ratio of the number to the number of all features in the frame image is less than the set number threshold, then it is determined that the frame image is the first environment image that meets the key frame requirements, otherwise, it is determined that the frame image does not meet the key frame requirements, continue to judge the next frame of images until all the images in the video have been determined whether they meet the requirements of the key frame, so as to select the first environment image that meets the requirement of the key frame.
  • the first electronic device may use the first frame image in the video as the first first environment image that meets the key frame requirements, and based on this image, continue to select the image that meets the key frame requirements.
  • the first environment image requested by the keyframe may be used to select the image that meets the key frame requirements.
  • the first electronic device selects multiple frames of first environment images that meet the key frame requirements from the video, it uploads the multiple frames of first environment images to the distributed mapping system, and the distributed mapping system can perform multi-frame first environment images Store and manage, and perform image processing on multiple frames of the first environment image.
  • the first electronic device sends an image transmission request to the cloud scheduling center after selecting a first environment image that meets the key frame requirements, requesting to upload the image.
  • the cloud dispatching center receives the image transmission request, it returns the URL of the uploaded image to the electronic device, and then the electronic device uploads the first environment image to the object storage service according to the URL for storage.
  • the first electronic device when the first electronic device uploads the first environmental image that meets the key frame requirements, it may upload frame by frame, that is, each time the electronic device selects a frame of the first environmental image that meets the key frame requirements, it uploads the first environmental image that meets the key frame requirements.
  • the first environment image is uploaded to the distributed mapping system, and at the same time, the selection process of the next image meeting the key frame requirement is continued.
  • the electronic device can also choose to obtain all the first environment images meeting the key frame requirements, and then upload these first environment images to the distributed mapping system.
  • Step 3 The first electronic device collects the positioning information corresponding to each frame of the first environmental image and uploads it to the distributed mapping system; the second electronic device collects the positioning information corresponding to each frame of the second environmental image and uploads it to the distributed mapping system .
  • the first electronic device when the first electronic device uploads the first environment image to the distributed mapping system, it can also obtain the positioning information corresponding to the first environment image, and upload the positioning information corresponding to the first environment image to the Distributed mapping system.
  • the functions performed by the first electronic device and the second electronic device are the same.
  • the specific content of step 2 is introduced below taking the first electronic device as an example:
  • the positioning information corresponding to each frame of the first environmental image includes pose information, global positioning system (global positioning system, GPS) information, and inertial measurement unit (inertial measurement unit, IMU) when the first electronic device acquires the frame of the first environmental image. )information.
  • the pose information is measured by using the SLAM algorithm when the frame of the first environment image is captured by the first electronic device.
  • the GPS information is used to indicate the position determined by GPS positioning when the first electronic device captures the frame of the first environment image.
  • the IMU information is used to indicate the posture characteristics of the first electronic device measured based on the IMU sensor when the first electronic device captures the frame of the first environment image.
  • the first electronic device may upload the collected positioning information to the distributed mapping system in the form of meta data.
  • electronic devices can send metadata to the cloud dispatching center, and after receiving the metadata, the cloud dispatching center sends the metadata to the cache-shaped message middleware for caching for CPU algorithm component or GPU algorithm component to use.
  • the cloud scheduling center can store metadata to elastic file services.
  • Step 4 The distributed mapping system performs image processing on each frame of the first environment image uploaded by the first electronic device and each frame of the second environment image uploaded by the second electronic device.
  • each computing node can select an unprocessed frame from multiple frames of the first environment image or the second environment image, and perform image processing on the frame image, and continue to select the next frame after the processing is completed An image processing process is performed on the unprocessed image until it is determined that all the first environment images and the second environment images have been processed.
  • the computing node selects one frame of images from the multiple frames of the first environment image or the second environment image, it can be randomly selected, or it can be performed in the order of multiple frames of images (such as the order in which the images are uploaded to the distributed mapping system). choose.
  • the computing node when it is performing image processing on the first environment image, it may directly perform feature extraction and serialization processing on the first environment image. Before the computing node performs image processing on the second environment image, it can first convert the pose information corresponding to the second environment image according to the target conversion relationship determined in the mapping initialization process, so as to determine the target corresponding to the second environment image
  • the target pose information corresponding to the second environment image is the pose of the target relative to the three-dimensional coordinate system of the target three-dimensional map when the second electronic device captures the second environment image.
  • the pose information corresponding to the second environment image and the first environment image can be the pose information in the same coordinate system.
  • the process that the computing node converts the pose information corresponding to the second environment image to obtain the target pose information corresponding to the second environment image may also be referred to as registering the second environment image into the image sequence to which the first environment image belongs.
  • the image processing process performed by the computing node on the first environment image or the second environment image is introduced below.
  • the image processing procedures performed by the computing node on the first environment image and the second environment image are the same.
  • the computing node performs image processing on the first environment image as an example.
  • the image processing process includes the following steps A1-A2:
  • A1 Feature extraction: the computation node extracts the global features of the first environment image.
  • the computing node may perform local feature extraction and global feature extraction on the first environment image.
  • the computing node when performing local feature extraction, can extract feature vectors from the multi-scale grayscale features of each region in the first environment image to obtain the local features of the first environment image, and extract the features in the first environment image point.
  • the feature vector may be used to represent the texture feature of the local area in the first environment image.
  • the computing node can use the trained network model to cluster the local features of the image in the region with good feature invariance (such as meeting the set requirements), and calculate the relationship between each local feature and the clustering center. The weighted residual sums to obtain the global features of the first environment image.
  • the global feature can be used to characterize the overall structural feature of the first environment image.
  • A2 Serialization processing: the calculation node selects an image matching the image from the processed images according to the global features of the first environment image.
  • This step includes feature retrieval, feature matching and feature verification.
  • the feature retrieval means that the computing node searches the processed image (that is, the image that has undergone the above-mentioned image processing process, including the first environmental image that has undergone the above-mentioned image processing process and The global features of the second environment image) are retrieved to obtain a set number of global features closest to the global features of the first environment image, and the images corresponding to the retrieved global features are used as candidate frame images.
  • the electronic device may simultaneously collect a set number of frames of the first environment image whose time is earlier than the collection time of the first environment image and which is closest to the collection time of the first environment image as candidate frame images.
  • Feature matching means that the calculation node matches the local features in the candidate frame image with the local features of the first environment image, and selects N matching pairs satisfying a certain threshold condition.
  • the calculation node can use the nearest neighbor (k-nearest neighbor, KNN) matching algorithm to select a feature point that matches the local feature point in the first environment image from the local feature points of the candidate frame image, and match the feature point with the local feature point in the first environment image.
  • Local feature points in the first environment image form matching pairs.
  • Computing nodes can also select matching pairs by using the deep learning model for matching after training the deep learning model.
  • Feature checking means that computing nodes filter out incorrectly matched information from the result of feature matching processing.
  • the computing node can use algorithms such as random sampling consistency verification to perform feature verification processing.
  • each computing node can determine the matching pair relationship (matching or not matching) between the image processed by itself and other processed images. Therefore, in all the images of the first environment image and the second environment image After the processing process, the matching pair relationship among all images can be obtained.
  • the N feature points of the same matching pair correspond to the same three-dimensional point in the three-dimensional map.
  • the cloud dispatching center can send the uploaded image to the queue-shaped message middleware
  • the message indicates the map transfer task corresponding to each frame of image, and the queue-shaped message middleware caches the information of the map transfer task.
  • Each GPU algorithm component respectively reads the map transfer task from the queue-shaped message middleware, and after reading the first environment image corresponding to the map transfer task from the object storage service, performs the above-mentioned image processing process on the read first environment image , and save the processing result (that is, the matching information of the image) to the elastic file service, and at the same time, send the identifier of the processing completion and the intermediate result of the processing process (such as the global feature of the image, etc.) to the cache-shaped message middleware for caching. Then, during the image processing process of the subsequent GPU algorithm node, the global features of the processed image can be read from the buffer-shaped message middleware for serialization processing, etc.
  • the execution sequence of some of the above steps does not have strict timing requirements, and can be adjusted according to actual conditions.
  • the execution of the above steps 3 and 4 depends on the image selected in the above step 2, but the above steps 3 and 4 can be disordered, that is, the computing node is executing the above steps 3 and 4 , you can perform one of the steps first, and then perform the other step, or you can perform both steps at the same time.
  • each computing node executes the above steps 1 to 4 independently of other computing nodes, and any two computing nodes do not interfere with each other.
  • the first electronic device and the second electronic device may also display a grid covering the outline of the environment on the display screen to prompt and guide the user to complete the scanning process.
  • the electronic device may use a time of flight (TOF) method to collect a depth map of an image that meets the key frame requirements, or, according to the selected image that meets the key frame requirements, use multi-view stereo matching (multi-view stereo matching). , MVS) to get the corresponding depth map.
  • TOF time of flight
  • MVS multi-view stereo matching
  • the electronic device After the electronic device obtains the depth map of each frame of image, it can use algorithms such as truncated signed distance function (TSDF) based on truncated signed distance function (TSDF) to extract voxels from each frame of image and determine the depth of each voxel in the three-dimensional voxel space value. After obtaining the voxels, the electronic device can convert the voxels into grids by using the marching cubes algorithm according to the depth value of each voxel and render them, and then display them in the corresponding areas in the environment interface shown on the display screen.
  • TSDF truncated signed distance function
  • TSDF truncated signed distance function
  • the electronic device may perform voxel extraction and grid conversion on the depth map of the image corresponding to the interface shown in FIG. 11 to obtain a grid.
  • the electronic device can display the interface shown in Figure 12.
  • the grid coverage area is the area that has been scanned and is not covered by the grid
  • the area is the area to be scanned or the area where the corresponding grid cannot be generated.
  • the electronic device can present the scanned and unscanned areas to the user in real time when the user operates the electronic device to scan the environmental space, so as to guide the user to continue operating the electronic device according to the grid prompts. Scanning enables the grid to cover as many three-dimensional objects in the real environment space to be scanned as possible, thereby completing the scanning process simply and quickly, reducing the operational difficulty of collecting environmental images, and improving user experience.
  • the first electronic device and the second electronic device may upload the depth map obtained during the scanning process to the distributed mapping system.
  • the distributed mapping system receives the first depth map uploaded by the first electronic device and the second depth map uploaded by the second electronic device
  • the coordinates corresponding to the second depth map can be calculated according to the target transformation relationship determined in the map building initialization.
  • the coordinate system is converted, and the second depth map and the first depth map after the coordinate system conversion are fused to obtain a complete depth map corresponding to the current environment.
  • the distributed mapping system can generate the white film corresponding to the current environment according to the complete depth map corresponding to the current environment, and the white film corresponding to the current environment can be used to represent the surface of each object in the current environment.
  • the distributed mapping system can generate a white model corresponding to the current environment through algorithms such as plane extraction, intersection calculation, polyhedron topology construction, and surface optimization.
  • the first electronic device when the first electronic device uploads the first depth map, it can send a depth map upload request to the cloud dispatching center, and the cloud dispatching center receives the depth map upload request, and sends The first electronic device returns the information of the upload address of the depth map.
  • the first electronic device may upload the first depth map to the object storage service for storage according to the depth map upload address. After the first electronic device uploads the first depth map, it can send a notification message that the upload of the depth map is completed to the cloud dispatching center.
  • the second electronic device may also upload the second depth map to the object storage service for storage, and send a notification message of completion of uploading the depth map to the cloud dispatching center.
  • the cloud scheduling center can send the white film creation task to the queue-shaped message middleware.
  • the white mask processing service in the CPU algorithm component monitors all the white mask creation tasks corresponding to the map identifiers cached in the queue-shaped message middleware, after the CPU algorithm component receives the task, it uploads the second electronic device according to the target conversion relationship. Coordinate conversion is performed on the depth map, and fusion processing is performed on the converted second depth map and the first depth map to obtain a complete depth map corresponding to the current environment, and then a white film corresponding to the current scene is generated based on the complete depth map.
  • the white mold processing service can send the results obtained by executing the white mold creation task to the elastic file storage service for storage, and at the same time send the corresponding white mold creation completion notification message to the queue-shaped message middleware.
  • the cloud scheduling center listens to the notification message that the white film is created, it can obtain the white film corresponding to the current scene from the elastic file service, send the white film to the object storage service for storage, and store the white film to the cloud database at the same time.
  • the computing nodes can also The stereo matching (multi-view stereo, MVS) algorithm determines the depth information of the first environment image of each frame and the depth information of the second environment image of each frame, and the depth information of each frame image includes the depth value of each pixel in the frame image .
  • the distributed mapping system can generate a white film corresponding to the current environment according to the depth information of each frame of the first environment image and the depth information of each frame of the second environment image.
  • the distributed mapping system After the distributed mapping system completes the mapping task of the 3D map, when the user operates the electronic device to play in the AR scene, and when the user places virtual items in the AR scene displayed on the display screen of the electronic device, the distributed mapping system can Based on the white film corresponding to the AR scene, the virtual item is placed on the surface of the object in the current environment displayed by the electronic device, so that the AR scene displayed by the electronic device is more realistic and more interactive.
  • the first electronic device may send a mapping instruction to the distributed mapping system.
  • the distributed mapping system can obtain the scanning progress of the second electronic device and send it to the first electronic device.
  • the first electronic device may display the scanning progress of the second electronic device on the display screen. The user may wait for the second electronic device to end scanning, or the first electronic device may send an end scanning instruction to the second electronic device, and the second electronic device ends scanning after receiving the end scanning instruction.
  • FIG. 13 is a schematic diagram of a scanning progress interface displayed by the first electronic device according to an embodiment of the present application.
  • the first electronic device can view the scanning progress of the second electronic device on the scanning progress interface shown in FIG. 13 .
  • Figure 13 includes the scan progress and status of the first electronic device (device A) and two second electronic devices (device B and device C), the user can click the end scan control in the scan progress interface to forcibly end device B and a scan of device C.
  • the second electronic device After the second electronic device finishes scanning, it can send a mapping instruction to the distributed mapping system.
  • the distributed mapping system After the distributed mapping system receives the mapping instruction sent by the first electronic device and the mapping instruction sent by the second electronic device, it can use the first environmental image uploaded by the first electronic device and the second environmental image uploaded by the second electronic device.
  • the environment image and the positioning information corresponding to each frame image are used to create a three-dimensional map.
  • the computing nodes of the distributed mapping system perform the image processing described in the above step 4 on all the images uploaded by the first electronic device and the second electronic device, the three-dimensional map can be generated according to the following steps B1-B4 create:
  • the calculation node generates a scene matching relationship graph (scene graph) according to the multiple frames of the first environment image and the multiple frames of the second environment image, wherein the scene matching relationship graph is used to represent the matching relationship between the multiple frames of images.
  • scene graph scene matching relationship graph
  • the calculation node can determine the common-view relationship of the multi-frame images according to the matching relationship between the multi-frame images, and then obtain the scene matching relationship diagram after optimizing the common-view relationship, wherein the multi-frame images include multiple frames of the first environment image and multiple Frame the second environment image.
  • the scene matching graph can be regarded as an abstract network composed of "vertices" and "edges". Each vertex in the network can represent a frame of image, and each edge represents a pair of matching pairs of feature points between images. Different "vertices” can be connected through “edges”, which means that two vertices connected through “edges” have an association relationship, that is, a matching relationship between two frames of images represented by two "vertices”.
  • the calculation node determines the corresponding three-dimensional points in three-dimensional space of each feature point in the multi-frame image according to the scene matching relationship graph.
  • the calculation node After the calculation node generates the scene matching relationship graph, it can determine the multiple frames of the first environment image and the multiple A three-dimensional point corresponding to each feature point in the frame of the second environment image in the three-dimensional space.
  • the three-dimensional coordinate system of the three-dimensional space is the three-dimensional coordinate system corresponding to the first electronic device, which is consistent with the coordinate system of the initial three-dimensional map generated in the initialization process of mapping.
  • the calculation node can use algorithms such as direct linear transformation (DLT) to combine the pose information of the environment image and the internal parameters of the camera to solve the corresponding angle of the feature point in the three-dimensional space. position (that is, triangulation), and determine the point at the position as the three-dimensional point corresponding to the feature point in the three-dimensional space.
  • DLT direct linear transformation
  • the calculation node determines the corresponding three-dimensional points in the three-dimensional space of all the feature points in the scene matching relationship graph, a three-dimensional map composed of these three-dimensional points can be obtained, and the three-dimensional map is a three-dimensional point cloud map.
  • B3 The computing node optimizes the coordinates of 3D points in 3D space.
  • the calculation node can perform bundle adjustment (BA) optimization on the 3D points obtained from the above solution, that is, optimize the pose information of the environment image by back-projecting the 3D points in the 3D space back to the position error of the image according to the camera model , 3D point positions, and the camera internal reference matrix of the electronic device, so as to obtain accurate pose information of the environment image, camera internal parameters, and coordinates of the 3D points, and then obtain an optimized 3D map.
  • BA bundle adjustment
  • the computing node generates a 3D map based on the optimized 3D points.
  • the calculation node can combine the camera pose and the GPS information and IMU information corresponding to each frame of the first environment image and each frame of the second environment image for smoothing and denoising processing to obtain the real-world camera pose corresponding to each frame of image.
  • the camera pose corresponding to each frame of image may be the position and orientation in the real environment when the electronic device captures the frame of image.
  • the calculation node aligns the coordinates of the 3D points in the 3D space with the coordinates of the real world, so that the coordinate system of the 3D space is adjusted to be consistent with the coordinate system of the real environment space, and then the real A three-dimensional map of the same proportion of the environment, the three-dimensional map is a point cloud map corresponding to the real environment scene.
  • FIG. 14 is a schematic diagram of a three-dimensional map provided in the embodiment of the present application.
  • the three-dimensional points in the three-dimensional map shown in FIG. 14 respectively correspond to the three-dimensional points in the real environment scanned by the electronic device.
  • the position in space is used to characterize the position of the 3D point in the real environment corresponding to the 3D point in the real environment.
  • the first electronic device when the first electronic device triggers mapping, it can send a mapping instruction and the number of image scans for this mapping (that is, the first image that meets the key frame requirements). The number of environmental images) and other information to the cloud dispatching center.
  • the second electronic device when the second electronic device triggers mapping, it can also send information such as the mapping command and the number of image scans for this mapping (that is, the number of second environment images that meet the key frame requirements) to the cloud dispatching center.
  • the dispatching center sends the mapping task to the queue-shaped message middleware, and at the same time, the basic information and user attribute information of this mapping are saved in the cloud database.
  • Each CPU algorithm component can monitor and process the mapping task in the queue-shaped message middleware, and finally generate a 3D map file and store it in the elastic file service.
  • each CPU algorithm component can send information such as map building progress, map building success or failure, map alignment matrix (transformation matrix between the SLAM coordinate system and the real world coordinate system) to the queue-shaped message center file, and save the mapping result (that is, the created 3D map) to the elastic file service.
  • the cloud scheduling center can monitor the information of the map construction progress in the queue-shaped message middleware, so as to obtain information such as the processing progress, status, and map alignment matrix of the current map construction task, and store these information in the cloud database.
  • multiple electronic devices can collect environmental images at the same time, and the distributed mapping system can convert the environmental images collected by multiple electronic devices into the same three-dimensional coordinate system, and generate a three-dimensional map according to the converted environmental images, thereby realizing Multi-device mapping improves the efficiency of creating 3D maps of AR scenes, and at the same time enhances the interaction between multiple users during the mapping process to improve user experience.
  • FIG. 15 is a schematic flowchart of a method for constructing a three-dimensional map with multiple devices according to an embodiment of the present application.
  • the method can be executed by the first electronic device, the second electronic device and the distributed mapping system in the augmented reality system shown in FIG. 2 .
  • the method comprises the following steps:
  • the first electronic device scans a real-world scene, collects multiple frames of first initial images, and runs a SLAM algorithm to extract positioning information corresponding to each frame of the first initial image.
  • the positioning information includes pose information, GPS information, and IMU information.
  • S1502 The first electronic device sends a multi-person mapping request to the distributed mapping system.
  • the multi-person mapping request includes multiple frames of the first initial image and positioning information of each frame of the first initial image.
  • S1503 The computing nodes in the distributed mapping system construct an initial three-dimensional map according to multiple frames of the first initial image and the positioning information of each frame of the first initial image.
  • the second electronic device scans the real-world scene, collects multiple frames of the second initial image, and runs a SLAM algorithm to extract positioning information corresponding to each frame of the second initial image.
  • S1505 The second electronic device sends a request for joining the mapping to the distributed mapping system.
  • adding to the mapping request includes multiple frames of the second initial image and positioning information of each frame of the second initial image.
  • S1506 The computing node in the distributed mapping system performs image processing on multiple frames of the second initial image, and determines that the second initial image contains the image content of any frame of the first initial image.
  • S1507 The calculation node in the distributed mapping system determines the target conversion relationship according to the initial three-dimensional map and multiple frames of second initial images.
  • the target conversion relationship is a conversion relationship between the three-dimensional coordinate system corresponding to the second initial image and the three-dimensional coordinate system of the initial three-dimensional map.
  • the first electronic device scans the video of the real-world scene during the moving process, and selects the first environment image that meets the key frame requirements; and, runs the SLAM algorithm and extracts the positioning information corresponding to each frame of the first environment image.
  • the positioning information includes pose information, GPS information, and IMU information.
  • the moving process of the first electronic device is controlled by the first user.
  • S1509 The first electronic device uploads the multiple frames of the first environment image meeting the key frame requirements to the distributed mapping system respectively.
  • the second electronic device scans the video of the real-world scene during the moving process, and selects a second environment image that meets the key frame requirements; and, runs a SLAM algorithm and extracts positioning information corresponding to each frame of the second environment image.
  • the positioning information includes pose information, GPS information, and IMU information.
  • the moving process of the second electronic device is controlled by the second user.
  • S1511 The second electronic device uploads multiple frames of second environment images meeting key frame requirements to the distributed mapping system.
  • S1512 The computing nodes in the distributed mapping system convert the pose information of the second environment image according to the target conversion relationship.
  • S1513 The calculation node in the distributed mapping system generates the target three-dimensional map according to the multiple frames of the first environment image and the multiple frames of the pose-converted second environment image.
  • S1514 The distributed mapping system sends the target three-dimensional map to the first electronic device and the second electronic device.
  • S1515 The first electronic device acquires first depth maps corresponding to multiple frames of first environment images.
  • S1516 The first electronic device sends the first depth map to the distributed mapping system.
  • the second electronic device acquires a second depth map corresponding to multiple frames of second environment images.
  • S1518 The second electronic device sends the second depth map to the distributed mapping system.
  • S1519 The computing nodes of the distributed mapping system transform the coordinate system corresponding to the second depth map according to the target transformation matrix.
  • S1520 The computing nodes of the distributed mapping system perform fusion processing on the first depth map and the second depth map after the coordinate system conversion, to obtain a complete depth map corresponding to the current environment.
  • S1521 The computing node of the distributed mapping system generates a white film corresponding to the current environment according to the complete depth map.
  • the first electronic device and the second electronic device can display the AR scene on the display screen, and the user can view the AR scene for playing.
  • the distributed mapping system may locate the first electronic device and the second electronic device according to the three-dimensional map.
  • the method for locating the first electronic device by the distributed mapping system is the same as the method for locating the second electronic device.
  • the following uses the distributed mapping system for locating the first electronic device as an example. The positioning method provided by the embodiment of this application is introduced:
  • the first electronic device may respond to the user's operation of selecting the three-dimensional map, collect at least one frame of the first user image of the current environment and upload it to the distributed mapping
  • the distributed mapping system uses the GVPS method to determine the relative position of the first electronic device in the three-dimensional map according to at least one frame of the first user image and the previously created three-dimensional map corresponding to the environment.
  • the distributed mapping system sends the determined relative position to the first electronic device.
  • the first electronic device may display an AR scene on the display screen based on the first relative position.
  • the AR scene may include the environment image currently captured by the camera device of the first electronic device and the corresponding information of the digital resources added by the user in the AR scene. Virtual item.
  • the first electronic device sends a positioning request and at least one currently scanned image of the first user to the cloud dispatching center, and the cloud dispatching center will receive
  • the received positioning request and at least one frame of the first user image are sent to the GVPS positioning service.
  • the GVPS positioning service reads the map data of the three-dimensional map stored in the elastic file service, determines the relative position of the current pose of the first electronic device in the three-dimensional map according to the map data and at least one frame of the first user image, and stores the relative position The information is sent to the cloud dispatching center.
  • the cloud dispatching center After the cloud dispatching center queries the current map-related point of interest (point of interest) POI information from the cloud database, the POI information and the relative position from the GVPS service are sent to the first electronic device.
  • the first electronic device may download the three-dimensional digital resource model from the object storage service according to the received POI information, render the model, and add it to the AR scene displayed by the first electronic device.
  • the first electronic device can view the AR scene on the display screen and add digital resources to the AR scene.
  • the user can view the AR scene displayed on the first electronic device.
  • the environment images captured by the camera device in real time and the 3D digital resource models added by users are observed in the system.
  • the electronic device displays materials containing three-dimensional digital resource models in the AR scene interface shown in FIG. 16a, from which the user can select materials and add them to the digital world scene shown in FIG. 16a.
  • the electronic device can display the AR scene interface after the 3D digital resource model is added as shown in Figure 16b, the AR scene interface includes the image of the real environment scene and the 3D digital resource model added by the user , which can realize the integrated display of real world scenes and virtual digital resources.
  • the first electronic device may display a corresponding white model in the area, so as to guide the user to select a suitable area to place the material.
  • the first electronic device may determine the corresponding position of the placed area in the three-dimensional map, and send the identification and position of the digital resource to the distributed storage system.
  • the distributed storage system stores the identification and location of digital resources, so that the next time the user holds an electronic device to view the AR scene, the virtual items corresponding to the digital resources added by the user can still be displayed.
  • the first electronic device may request the cloud scheduling center for a list of digital resources corresponding to the three-dimensional digital resource model.
  • the cloud dispatching center obtains the list of digital resources corresponding to the current user by querying the cloud database, and sends the list to the first electronic device.
  • the first electronic device downloads the three-dimensional digital resource model from the object storage service through the URL, and adds it to the AR scene.
  • the user can trigger the first electronic device to upload information such as the size and pose of the current digital resource model to the cloud dispatching center by clicking the saved control displayed on the first electronic device, and the cloud dispatching center sends the information to the cloud The database is saved.
  • the second electronic device may also display an AR scene on the display screen for the user to play, and specific implementation may refer to the above-mentioned embodiments, and repeated descriptions will not be repeated.
  • multiple users can operate different electronic devices to play in the same AR scene at the same time.
  • the first electronic device and the second user operating the second electronic device can display the AR scene and play interactively; or after the distributed mapping system creates the initial 3D map, it can also send the initial 3D map to the first electronic device and the second electronic device, at this time the second The electronic device can choose to scan the environment image or directly display the AR scene based on the initial three-dimensional map for playing.
  • the second electronic device can directly display the AR scene constructed according to the initial 3D map for the second user to play, and the first electronic device can also display the AR scene constructed according to the initial 3D map for the second user to play.
  • the first electronic device can display the AR scene on the display screen, and the second electronic device can also display the AR scene on the display screen.
  • the second electronic device can display a three-dimensional digital resource model related to the first user on the image of the first user, thereby realizing interactive play between the second electronic device and the first electronic device.
  • FIG. 17 is a schematic diagram of multi-user interactive play in an AR scene displayed on an electronic device according to an embodiment of the present application.
  • Fig. 17 is the content displayed in the display of the first electronic device, and the first electronic device displays an AR scene, and the AR scene includes an environment image and a three-dimensional digital resource model captured by the first electronic device in real time.
  • the second user holds the second electronic device and enters the photographable area of the camera device of the first electronic device, and the display screen of the first electronic device displays the image obtained by shooting the second user, and displays the second image above the second user.
  • the three-dimensional digital resource model of the "life value" corresponding to the user wherein, the second electronic device can run the SLAM algorithm on the images collected in real time to determine the pose information of the second electronic device, and the second electronic device sends the pose information to the distributed Mapping system, the distributed mapping system can determine the target pose information of the second electronic device according to the 3D map of the AR scene or the initial 3D map, and the target pose information is the position of the second electronic device in the 3D coordinate system of the 3D map Posture information.
  • the distributed mapping system sends the target pose information of the second electronic device to the first electronic device, and the first electronic device determines the "life value” 3D digital resource model corresponding to the second user based on the target pose information of the second electronic device position in the AR scene, and display the three-dimensional digital resource model of "life value” corresponding to the second user on the display screen according to the position.
  • the display screen of the first electronic device displays the image obtained by shooting the third user, and displays the image obtained by the third user above the third user.
  • the three-dimensional digital resource model of "life value" corresponding to the user, the first user, the second user and the third user can operate and play interactively on their respective electronic devices.
  • the first electronic device and the second electronic device can provide the user with the function of using and editing the created 3D map, and at the same time allow the user to add a 3D digital resource model to the created 3D map, realizing the integration of real environment scenes and virtual digital resources. Fusion application of scenes.
  • the multi-device mapping method provided in the embodiment of the present application is not only applicable to the multi-device scanning environment scene shown in FIG. 10 , but also can be applied to merging multiple 3D maps, expanding 3D maps, and other scenarios.
  • the following introduces the implementation manners when the multi-device mapping method provided in the embodiment of the present application is applied to two scenarios of merging multiple 3D maps and extending the 3D map:
  • FIG. 18 is a schematic diagram of a created three-dimensional map list displayed by an electronic device provided in an embodiment of the present application. Users can click to select multiple 3D maps to merge.
  • the electronic device After receiving the map merging instruction triggered by the user, the electronic device sends a map merging request to the distributed mapping system, and the map merging request includes identifiers of multiple three-dimensional maps selected by the user.
  • the distributed mapping system After receiving the map merging request, acquires multiple frames of environment images corresponding to each 3D map according to the identifiers of the multiple 3D maps in the map merging request. Taking two 3D maps (the first map and the second map) selected by the user as an example for illustration, the distributed mapping system is based on the global features of the multi-frame first environment images corresponding to the first map and the multi-frame first environment images corresponding to the second image.
  • the global features of the two environmental images determine the matching image pairs in the multi-frame first environmental image and the multi-frame second environmental image, wherein the two-frame image matching can be that the global features of the two frame images are similar (such as the global feature of the two frame images) The similarity between them is greater than the preset threshold).
  • the distributed mapping system determines target pose information corresponding to the second environment image according to the matching image pair, where the target pose information is the pose information of the second environment image relative to the three-dimensional coordinate system corresponding to the first environment image.
  • the distributed mapping system determines the target transformation relationship corresponding to the second environment image according to the initial pose information and target pose information of the second environment image, and the target transformation relationship is the three-dimensional coordinate system corresponding to the second environment image and the first environment image The conversion relationship between the corresponding three-dimensional coordinate systems.
  • the distributed mapping system can perform pose conversion on multiple frames of the second environment image according to the target conversion relationship.
  • the distributed mapping system can generate a target three-dimensional map based on multiple frames of the first environment image and multiple frames of the pose-converted second environment image, and the target three-dimensional map can be regarded as a three-dimensional map after combining the first map and the second map .
  • the specific implementation of the distributed mapping system to generate the target three-dimensional map based on the multiple frames of the first environment image and the multiple frames of the second environment image can refer to the content of the data acquisition and mapping phase in the embodiment of the application, and the repetition will not be repeated. .
  • FIG. 19 is a schematic diagram of a user-triggered extended three-dimensional map provided in an embodiment of the present application. Referring to FIG. 19 , the user can click the expansion control in the display interface shown in FIG. 19 to trigger the electronic device to expand the three-dimensional map.
  • the electronic device After receiving the user-triggered instruction to extend the three-dimensional map, the electronic device sends a request message for extending the three-dimensional map to the distributed mapping system, where the request message includes the identifier of the three-dimensional map and the image of the first user.
  • the first user image is an environmental image collected by the electronic device for the current environment after receiving an instruction to expand the three-dimensional map triggered by the user.
  • the computing nodes in the distributed mapping system perform image processing on the first user image to obtain global features and local features of the first user image.
  • the calculation node obtains the multi-frame environment images corresponding to the three-dimensional map according to the identification of the three-dimensional map, and determines the environment image matching the first user image according to the global features of the multi-frame environment images and the global features of the first user image.
  • the environment image matched with the first user image may be an environment image whose global features are similar to those of the first user image.
  • the calculation node determines the target pose information of the first user image relative to the three-dimensional coordinate system corresponding to the environment image according to the first user image and the environment image matched with the first user image, and according to the initial pose information of the first user image and the target
  • the pose information determines a target conversion relationship corresponding to the first user image, where the target conversion relationship is a conversion relationship between a three-dimensional coordinate system corresponding to the first environment image and a three-dimensional coordinate system corresponding to the environment image.
  • the electronic device may display information that guides the user to scan the current environment, and the user holds the electronic device to scan the current environment.
  • the electronic device uploads the first environment image meeting the key frame requirement in the scanned video to the distributed mapping system.
  • the distributed mapping system converts the pose information of the first environment image according to the target conversion relationship to obtain the target pose information of the first environment image, and uploads the electronic device
  • the multiple frames of the first environment images are registered to the graphics sequence composed of the multiple frames of environment images corresponding to the three-dimensional map.
  • the distributed mapping system can generate a target 3D map based on the multi-frame environment images corresponding to the original 3D map and the multi-frame first environment images after pose conversion.
  • the target 3D map can be regarded as an expanded version of the original 3D map selected by the user. 3D map.
  • the specific implementation of the distributed mapping system to generate the target 3D map according to the multi-frame environment image corresponding to the original 3D map and the multi-frame first environment image after pose conversion can be referred to in the data acquisition and map building stage in the embodiment of this application. The content will not be repeated here.
  • the present application further provides a method for creating a three-dimensional map with multiple devices.
  • the method can be executed by the first electronic device, the second electronic device, and the distributed mapping system in the augmented reality system shown in FIG. 2 , and the first electronic device and the second electronic device can have the /or the structure shown in FIG. 4 , the distributed mapping system may have the structure shown in FIG. 5 in the embodiment of the present application.
  • FIG. 20 is a flowchart of a method for creating a three-dimensional map with multiple devices according to an embodiment of the present application. Referring to Figure 20, the method comprises the following steps:
  • the first electronic device sends multiple frames of the first environment image and pose information of each frame of the first environment image to the distributed mapping system.
  • the first environment image is obtained by the first electronic device shooting the environment where it is located, and the pose information of each frame of the first environment image is used to indicate that the first electronic device corresponds to the first electronic device when the first environment image is taken. Position and orientation in a three-dimensional coordinate system.
  • S2002 The second electronic device sends multiple frames of the second environment image and initial pose information of each frame of the second environment image to the distributed mapping system.
  • the second environment image is obtained by the second electronic device shooting the environment where it is located, and the initial pose information of each frame of the second environment image is used to indicate that the second electronic device corresponds to The position and orientation in the three-dimensional coordinate system of .
  • S2003 The distributed mapping system performs pose transformation on the initial pose information of multiple frames of the second environment image according to the object transformation relationship, to obtain the object pose information of each frame of the second environment image.
  • the target pose information is used to represent the position and orientation of the second electronic device in the three-dimensional coordinate system corresponding to the first electronic device when taking the second environmental image;
  • the target conversion relationship is the three-dimensional coordinate system corresponding to the second electronic device and The conversion relationship between the three-dimensional coordinate systems corresponding to the first electronic device.
  • the distributed mapping system creates a three-dimensional map according to the multiple frames of the first environment image, the pose information of each frame of the first environment image, the multiple frames of the second environment image, and the target pose information of each frame of the second environment image.
  • three-dimensional maps can be used to construct augmented reality scenes.
  • the present application also provides an electronic device, the electronic device includes multiple functional modules; the multiple functional modules interact to implement the first electronic device or the second electronic device in each method described in the embodiments of the present application. 2.
  • the functions performed by the electronic equipment Such as executing S1501-S1502, S1508-S1509, S1515-S1516 performed by the first electronic device in the embodiment shown in Figure 15, or executing S1504-S1505, S1510-S1511, S1510-S1511, S1517-S1518.
  • the multiple functional modules can be implemented based on software, hardware or a combination of software and hardware, and the multiple functional modules can be combined or divided arbitrarily based on specific implementations.
  • the present application also provides an electronic device, which includes at least one processor and at least one memory, where computer program instructions are stored in the at least one memory, and when the electronic device is running, the at least one processing
  • the device executes the functions executed by the first electronic device or the second electronic device in the methods described in the embodiments of the present application.
  • the present application further provides a distributed mapping system, where the distributed mapping system includes multiple computing nodes, and the multiple computing nodes process data in parallel/serially.
  • Each computing node is configured to perform the functions performed by the distributed mapping system in the methods described in the embodiments of the present application. For example, S1503, S1506-S1507, S1512-S1513, and S1519-S1521 in the embodiment shown in FIG. 15 are executed.
  • multiple computing nodes may include multiple CPU algorithm components and multiple GPU algorithm components, where the GPU algorithm components may perform processes such as image processing, and the CPU algorithm components may perform processes such as data processing.
  • the GPU algorithm component may execute S1506 in the embodiment shown in FIG. 15 ; the CPU algorithm component may execute steps S1507 and S1512 in the embodiment shown in FIG. 15 .
  • the present application also provides a computer program that, when the computer program is run on a computer, causes the computer to execute the methods described in the embodiments of the present application.
  • the present application also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer executes the computer program described in the embodiments of the present application. methods described.
  • the present application also provides a chip, the chip is used to read the computer program stored in the memory, and implement the methods described in the embodiments of the present application.
  • the present application provides a system-on-a-chip, where the system-on-a-chip includes a processor, configured to support a computer device to implement the methods described in the embodiments of the present application.
  • the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请提供一种增强现实系统、多设备构建三维地图的方法及设备。在该方法中,分布式建图系统接收第一电子设备发送的多帧第一环境图像和每帧第一环境图像的位姿信息,接收第二电子设备发送的多帧第二环境图像和每帧第二环境图像的初始位姿信息;根据目标转换关系确定每帧第二环境图像在第一电子设备对应的三维坐标系中的目标位姿信息。根据每帧第一环境图像及其位姿信息、每帧第二环境图像及其目标位姿信息创建三维地图。通过该方案,分布式建图系统可以将不同电子设备上传的环境图像的位姿转换为同一三维坐标系下的位姿,进而根据多设备上传的环境图像构建三维地图,提升建图效率,同时增强多设备之间的交互性,提升用户体验。

Description

一种增强现实系统、多设备构建三维地图的方法及设备
相关申请的交叉引用
本申请要求在2022年01月06日提交中华人民共和国知识产权局、申请号为202210010556.3、申请名称为“一种增强现实系统、多设备构建三维地图的方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及增强现实技术领域,尤其涉及一种增强现实系统、多设备构建三维地图的方法及设备。
背景技术
增强现实(augmented reality,AR)是一种将真实世界信息和虚拟世界信息集成显示的技术。AR技术可以将原本在现实世界难以体验的实体信息进行模拟仿真得到虚拟信息,并将虚拟信息应用到真实世界,以使真实环境和虚拟物体实时叠加到同一个画面或空间同时被用户感知,以达到超越现实的感官体验。
用户可以在电子设备的显示屏中显示的AR场景中同时观察到真实世界和虚拟物品,如用户在电子设备的显示屏中观察到的真实世界的地面、桌子等物品,同时还可以观察到地面上放置的动漫人物等虚拟物品。
AR场景的三维地图可以用于表示真实世界的环境信息,进而电子设备可以基于三维地图构建AR场景。例如,当用户通过电子设备的摄像装置实时拍摄真实世界时,用户可以在电子设备的显示屏中显示的电子场景中添加虚拟物品,电子设备基于真实世界对应的三维地图将虚拟物体添加到AR场景中,用户可以在同一个画面中观察到真实世界和用户添加的虚拟物品。
发明内容
本申请提供一种增强现实系统、多设备构建三维地图的方法与设备。
第一方面,本申请提供一种增强现实系统,该增强现实系统包括第一电子设备、第二电子设备和分布式建图系统;
所述第一电子设备,用于向所述分布式建图系统发送多帧第一环境图像和每帧第一环境图像的位姿信息;所述第一环境图像为所述第一电子设备对所处环境拍摄得到的,每帧第一环境图像的位姿信息用于表示所述第一电子设备在拍摄第一环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;
所述第二电子设备,用于向所述分布式建图系统发送多帧第二环境图像和每帧第二环境图像的初始位姿信息,所述第二环境图像为所述第二电子设备对所处环境拍摄得到的,每帧第二环境图像的初始位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第二电子设备对应的三维坐标系中的位置和朝向;
所述分布式建图系统,用于接收所述第一电子设备发送的所述多帧第一环境图像和每 帧第一环境图像的位姿信息;接收所述第二电子设备发送的所述多帧第二环境图像和每帧第二环境图像的初始位姿信息;根据目标转换关系对所述多帧第二环境图像的初始位姿信息进行位姿转换,得到每帧第二环境图像的目标位姿信息,所述目标位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;所述目标转换关系为所述第二电子设备对应的三维坐标系与所述第一电子设备对应的三维坐标系之间的转换关系;根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,所述三维地图用于构建增强现实场景。
基于上述增强现实系统,分布式建图系统在接收到多个电子设备上传的环境图像后,可以将不同电子设备上传的环境图像的位姿转换为同一三维坐标系下的位姿,分布式建图系统可以根据位姿转换后的环境图像生成三维地图,进而实现多设备构建三维地图,提升建图效率,同时增强多设备之间的交互性,提升用户体验。
在一个可能的设计中,所述第一电子设备还用于:在向所述分布式建图系统发送所述多帧第一环境图像和每帧第一环境图像的位姿信息之前,向所述分布式建图系统发送多设备建图请求,所述多设备建图请求中包括多帧第一初始图像以及每帧第一初始图像的定位信息;所述多帧第一初始图像为所述第一电子设备对所处环境拍摄得到的;
所述第二电子设备还用于:在向所述分布式建图系统发送所述多帧第二环境图像和每帧第二环境图像的初始位姿信息之前,向所述分布式建图系统发送加入建图请求,所述加入建图请求中包括多帧第二初始图像;所述多帧第二初始图像为所述第二电子设备对所处环境拍摄得到的;
所述分布式建图系统还用于:接收所述第一电子设备发送的所述多设备建图请求,根据所述多设备建图请求中的多帧第一初始图像以及每帧第一初始图像的定位信息生成初始三维地图;接收所述第二电子设备发送的加入建图请求,根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系。
通过该设计,分布式建图系统可以接收第一电子设备发送的多设备建图请求,并根据多设备建图请求中的多帧第一初始图像生成初始三维地图,该初始三维地图可以用于对第二电子设备进行定位,还可以用于确定目标转换关系。分布式建图系统可以接收第二电子设备发送的加入建图请求,并根据加入建图请求中的多帧第二初始图像和初始三维地图确定目标转换关系,目标转换关系可以用于将第二电子设备采集到的图像的位姿信息转换到第一电子设备对应的三维坐标系下,以使分布式建图系统可以基于统一坐标系下的环境图像进行三维地图的创建。
在一个可能的设计中,所述分布式建图系统还用于:在接收到所述多帧第二初始图像之后,根据所述多帧第二初始图像和所述初始三维地图确定目标转换关系之前,对所述多帧第二初始图像进行图像处理,确定至少一帧第二初始图像与任一帧第一初始图像包含相同的图像内容。
通过该设计,第二电子设备需要对第一电子设备扫描过的区域进行再次扫描,以采集多帧第二初始图像,至少一帧第二初始图像中包括任一帧第一初始图像包含的图像内容时,分布式建图系统可以确认第二电子设备加入多设备建图任务,并且可以进一步基于第二初始图像和第一初始图像确定目标转换关系。
在一个可能的设计中,所述分布式建图系统具体用于:提取目标初始图像的全局特征 和特征点,所述目标初始图像为任一帧第二初始图像;根据目标初始图像的全局特征确定与目标初始图像匹配的至少一帧第一初始图像,并确定与目标初始图像匹配的至少一帧第一初始图像中的特征点在三维地图中对应的三维点,将确定出的三维点作为目标初始图像的特征点对应的三维点;根据目标初始图像的特征点、目标初始图像的特征点对应的三维点以及第二电子设备的相机内参确定所述目标初始图像的目标位姿信息;根据所述目标初始图像的初始位姿信息和所述目标初始图像的目标位姿信息确定所述目标转换关系。
通过该设计,分布式建图系统在确定目标转换关系时,可以确定任一帧第二初始图像的特征点对应的三维地图中的三维点,进而根据第二初始图像的特征点、特征点对应的三维点以及第二电子设备的相机内参确定目标位姿信息,目标位姿信息为第二电子设备在拍摄第二初始图像时在初始三维地图对应的三维坐标系中的位姿信息,而初始三维地图的三维坐标系与第一电子设备对应的三维坐标系相同,初始位姿信息为第二电子设备对应的三维坐标系中的位姿信息,从而可以根据目标位姿信息和初始位姿信息确定出准确的第二电子设备对应的坐标系和第一电子设备对应的坐标系之间的目标转换关系。
在一个可能的设计中,所述分布式建图系统还用于:根据所述初始三维地图中的三维点生成点云资源;接收所述第一电子设备发送定位请求,所述定位请求中包括所述第一电子设备采集到的环境图像;根据所述定位请求中的环境图像和所述初始三维地图对所述第一电子设备进行定位,确定所述第一电子设备在所述初始三维地图的三维坐标系中的位姿;向所述第一电子设备发送所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源;
所述第一电子设备还用于:向所述分布式建图系统发送定位请求;接收所述分布式建图系统发送的所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源;根据所述第一电子设备在所述初始三维地图的三维坐标系中的位姿显示所述第一电子设备实时采集的环境图像和所述点云资源,以表示所述点云资源覆盖的区域已完成扫描。
通过该设计,分布式建图系统可以根据三维地图中的三维点生成点云资源并发送给第一电子设备,第一电子设备可以显示实时采集的环境图像和点云资源,以表示第一电子设备已完成扫描的区域,引导用户继续扫描其它区域,提升用户体验。
在一个可能的设计中,所述分布式建图系统具体用于:从所述多帧第一环境图像和所述多帧第二环境图像中选择待处理的一帧图像作为目标图像,并对所述目标图像进行目标处理过程,至所述多帧第一环境图像和所述多帧第二环境图像均已进行所述目标处理过程;所述目标处理过程包括以下步骤:提取所述目标图像的第一特征点;获取已进行所述目标处理过程的至少一帧图像的特征点;在所述至少一帧图像的特征点中选择至少一个第二特征点与所述第一特征点组成特征匹配对;其中,所述第一特征点和所述至少一个第二特征点对应所述环境中的同一点;所述已进行所述目标处理过程的至少一帧图像包括至少一帧第一环境图像和/或至少一帧第二环境图像;获取对所述多帧第一环境图像和所述多帧第二环境图像进行目标处理过程后得到的多个特征匹配对,并根据所述多个特征匹配对创建所述三维地图。
通过以上设计,分布式建图系统在创建三维地图时,可以提取每帧环境图像的特征点,并确定与其它已完成图像处理的图像的特征点之间组成的特征匹配对,特征匹配对为同一三维点对应的不同环境图像中的特征点,根据特征匹配对可以确定三维点,进而创建三维地图,使得三维地图与真实环境更加贴合。
在一个可能的设计中,所述分布式建图系统具体用于:根据所述多帧第一环境图像的位姿信息和所述多帧第二环境图像的目标位姿信息,确定所述多个特征匹配对在所述第一电子设备对应的三维坐标系中对应的多个三维点,得到所述三维地图。
通过以上设计,分布式建图系统可以根据特征匹配对、多帧第一环境图像的位姿信息和多帧第二环境图像的目标位姿信息,确定特征匹配对所对应的三维点。该方法是基于不同环境图像中的特征点组成的特征匹配对对应三维空间中的同一三维点,实现对三维点的位置进行定位,进而确定出三维点,保证三维地图是根据第一电子设备和第二电子设备实际采集到的环境图像生成的,可以准确表示真实环境中的特征。
在一个可能的设计中,所述第一电子设备还用于:向所述分布式建图系统发送每帧第一环境图像对应的定位信息;
所述第二电子设备还用于:向所述分布式建图系统发送每帧第二环境图像对应的定位信息;
所述分布式建图系统还用于:接收所述第一电子设备发送的每帧第一环境图像对应的定位信息;接收所述第二电子设备发送的每帧第二环境图像对应的定位信息;根据每帧第一环境图像对应的定位信息和每帧第二环境图像对应的定位信息对所述三维点的坐标进行调整,得到与真实环境等比例的三维地图。
通过该设计,分布式建图系统可以基于第一环境图像的定位信息和第二环境图像的定位信息调整三维地图中三维点的坐标,得到与真实环境等比例的三维地图,根据该三维地图向用户显示增强现实场景时,虚拟世界可以与真实环境融合显示,提升用户体验。
在一个可能的设计中,所述第一电子设备还用于:采集所述多帧第一环境图像对应的第一深度图,将所述第一深度图发送给所述分布式建图系统;
所述第二电子设备还用于:采集所述多帧第一环境图像对应的第二深度图,将所述第二深度图发送给所述分布式建图系统;
所述分布式建图系统还用于:接收所述第一电子设备发送的所述第一深度图;接收所述第二电子设备发送的所述第二深度图;根据所述目标转换关系对所述第二深度图进行坐标转换,对所述第一深度图和坐标转换后的第二深度图进行融合处理,得到完整深度图;根据所述完整深度图生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
在一个可能的设计中,所述分布式建图系统还用于:基于多视图立体匹配算法确定每帧第一环境图像的深度信息和每帧第二环境图像的深度信息,根据每帧第一环境图像的深度信息和每帧第二环境图像的深度信息生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
通过以上设计,本申请提供多种确定环境图像的深度信息的方式,一种方式中,电子设备可以确定环境图像对应的深度图并发送给分布式建图系统,分布式建图系统可以根据目标转换关系对深度图的坐标进行转换;另一种方式中,分布式建图系统可以基于多视图立体匹配算法确定每帧环境图像的深度信息。分布式建图系统可以根据每帧环境图像对应的深度图或深度信息确定真实环境对应的白膜,白膜可以表示真实环境中各物体的表面,从而当电子设备向用户显示增强现实场景时,用户选择添加三维数字资源模型时,电子设备可以根据白膜将三维数字资源模型放置在真实环境的物体表面,提高增强现实场景的真实性,提升用户体验。
在一个可能的设计中,所述分布式建图系统还用于:将所述三维地图发送给所述第一电子设备和所述第二电子设备;接收所述第一电子设备发送的定位请求,确定所述第一电子设备在所述三维地图的三维坐标系中的第一位姿,将所述第一位姿发送给所述第一电子设备和所述第二电子设备;接收所述第二电子设备发送的定位请求,确定所述第二电子设备在所述三维地图的三维坐标系中的第二位姿,将所述第二位姿发送给所述第一电子设备和所述第二电子设备;
所述第一电子设备还用于:接收所述分布式建图系统发送的所述三维地图;向所述分布式建图系统发送定位请求,接收所述分布式建图系统发送的所述第一位姿和所述第二位姿,根据所述第一位姿和所述三维地图显示增强现实场景,并根据所述第二位姿显示拍摄到的使用所述第二电子设备的用户的图像以及所述第二电子设备对应的三维数字资源模型。
通过该设计,多个用户操作多个电子设备显示目标场景时,可以进行交互游玩。如第一用户操作第一电子设备,第二用户操作第二电子设备,若第二用户持第二电子设备进入第一电子设备的可拍摄范围时,第一电子设备可以显示实时拍摄到的第二用户的图像,并显示第二电子设备对应的三维数字资源模型,以实现设备之间的交互游玩,提升增强现实场景用户之间的互动性,提升用户体验。
第二方面,本申请提供一种多设备构建三维地图的方法,应用于增强现实系统,所述增强现实系统包括第一电子设备、第二电子设备和服务器,该方法包括:
所述第一电子设备向所述分布式建图系统发送多帧第一环境图像和每帧第一环境图像的位姿信息;所述第一环境图像为所述第一电子设备对所处环境拍摄得到的,每帧第一环境图像的位姿信息用于表示所述第一电子设备在拍摄第一环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;所述第二电子设备向所述分布式建图系统发送多帧第二环境图像和每帧第二环境图像的初始位姿信息,所述第二环境图像为所述第二电子设备对所处环境拍摄得到的,每帧第二环境图像的初始位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第二电子设备对应的三维坐标系中的位置和朝向;所述分布式建图系统根据目标转换关系对所述多帧第二环境图像的初始位姿信息进行位姿转换,得到每帧第二环境图像的目标位姿信息,所述目标位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;所述目标转换关系为所述第二电子设备对应的三维坐标系与所述第一电子设备对应的三维坐标系之间的转换关系;所述分布式建图系统根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,所述三维地图用于构建增强现实场景。
在一个可能的设计中,所述方法还包括:所述第一电子设备在向所述分布式建图系统发送所述多帧第一环境图像和每帧第一环境图像的位姿信息之前,向所述分布式建图系统发送多设备建图请求,所述多设备建图请求中包括多帧第一初始图像以及每帧第一初始图像的定位信息;所述多帧第一初始图像为所述第一电子设备对所处环境拍摄得到的;所述第二电子设备在向所述分布式建图系统发送所述多帧第二环境图像和每帧第二环境图像的初始位姿信息之前,向所述分布式建图系统发送加入建图请求,所述加入建图请求中包括多帧第二初始图像;所述多帧第二初始图像为所述第二电子设备对所处环境拍摄得到的; 所述分布式建图系统接收所述第一电子设备发送的所述多设备建图请求,根据所述多设备建图请求中的多帧第一初始图像以及每帧第一初始图像的定位信息生成初始三维地图;所述分布式建图系统接收所述第二电子设备发送的加入建图请求,根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系。
在一个可能的设计中,在所述分布式建图系统在接收到所述多帧第二初始图像之后,根据所述多帧第二初始图像和所述初始三维地图确定目标转换关系之前,所述方法还包括:所述分布式建图系统对所述多帧第二初始图像进行图像处理,确定至少一帧第二初始图像与任一帧第一初始图像包含相同的图像内容。
在一个可能的设计中,所述分布式建图系统根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系,包括:所述分布式建图系统提取目标初始图像的全局特征和特征点,所述目标初始图像为任一帧第二初始图像;所述分布式建图系统根据目标初始图像的全局特征确定与目标初始图像匹配的至少一帧第一初始图像,并确定与目标初始图像匹配的至少一帧第一初始图像中的特征点在三维地图中对应的三维点,将确定出的三维点作为目标初始图像的特征点对应的三维点;所述分布式建图系统根据目标初始图像的特征点、目标初始图像的特征点对应的三维点以及第二电子设备的相机内参确定所述目标初始图像的目标位姿信息;所述分布式建图系统根据所述目标初始图像的初始位姿信息和所述目标初始图像的目标位姿信息确定所述目标转换关系。
在一个可能的设计中,所述方法还包括:所述分布式建图系统根据所述初始三维地图中的三维点生成点云资源;所述分布式建图系统接收所述第一电子设备发送定位请求,所述定位请求中包括所述第一电子设备采集到的环境图像;根据所述定位请求中的环境图像和所述初始三维地图对所述第一电子设备进行定位,确定所述第一电子设备在所述初始三维地图的三维坐标系中的位姿;所述分布式建图系统向所述第一电子设备发送所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源;所述第一电子设备根据所述第一电子设备在所述初始三维地图的三维坐标系中的位姿显示所述第一电子设备实时采集的环境图像和所述点云资源,以表示所述点云资源覆盖的区域已完成扫描。
在一个可能的设计中,所述分布式建图系统根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,包括:所述分布式建图系统从所述多帧第一环境图像和所述多帧第二环境图像中选择待处理的一帧图像作为目标图像,并对所述目标图像进行目标处理过程,至所述多帧第一环境图像和所述多帧第二环境图像均已进行所述目标处理过程;
所述目标处理过程包括以下步骤:提取所述目标图像的第一特征点;获取已进行所述目标处理过程的至少一帧图像的特征点;在所述至少一帧图像的特征点中选择至少一个第二特征点与所述第一特征点组成特征匹配对;其中,所述第一特征点和所述至少一个第二特征点对应所述环境中的同一点;所述已进行所述目标处理过程的至少一帧图像包括至少一帧第一环境图像和/或至少一帧第二环境图像;
所述分布式建图系统获取对所述多帧第一环境图像和所述多帧第二环境图像进行目标处理过程后得到的多个特征匹配对,并根据所述多个特征匹配对创建所述三维地图。
在一个可能的设计中,所述分布式建图系统根据所述多个特征匹配对创建所述三维地图,包括:所述分布式建图系统根据所述多帧第一环境图像的位姿信息和所述多帧第二环境图像的目标位姿信息,确定所述多个特征匹配对在所述第一电子设备对应的三维坐标系 中对应的多个三维点,得到所述三维地图。
在一个可能的设计中,所述方法还包括:所述第一电子设备向所述分布式建图系统发送每帧第一环境图像对应的定位信息;所述第二电子设备向所述分布式建图系统发送每帧第二环境图像对应的定位信息;所述分布式建图系统根据每帧第一环境图像对应的定位信息和每帧第二环境图像对应的定位信息对所述三维点的坐标进行调整,得到与真实环境等比例的三维地图。
在一个可能的设计中,所述方法还包括:所述第一电子设备采集所述多帧第一环境图像对应的第一深度图,将所述第一深度图发送给所述分布式建图系统;所述第二电子设备采集所述多帧第一环境图像对应的第二深度图,将所述第二深度图发送给所述分布式建图系统;所述分布式建图系统根据所述目标转换关系对所述第二深度图进行坐标转换,对所述第一深度图和坐标转换后的第二深度图进行融合处理,得到完整深度图;所述分布式建图系统根据所述完整深度图生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
在一个可能的设计中,所述方法还包括:所述分布式建图系统基于多视图立体匹配算法确定每帧第一环境图像的深度信息和每帧第二环境图像的深度信息,根据每帧第一环境图像的深度信息和每帧第二环境图像的深度信息生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
在一个可能的设计中,所述方法还包括:所述分布式建图系统将所述三维地图发送给所述第一电子设备和所述第二电子设备;所述分布式建图系统接收所述第一电子设备发送的定位请求,确定所述第一电子设备在所述三维地图的三维坐标系中的第一位姿,将所述第一位姿发送给所述第一电子设备和所述第二电子设备;所述分布式建图系统接收所述第二电子设备发送的定位请求,确定所述第二电子设备在所述三维地图的三维坐标系中的第二位姿,将所述第二位姿发送给所述第一电子设备和所述第二电子设备;所述第一电子设备根据所述第一位姿和所述三维地图显示增强现实场景,并根据所述第二位姿显示拍摄到的使用所述第二电子设备的用户的图像以及所述第二电子设备对应的三维数字资源模型。
第三方面,本申请提供一种多设备构建三维地图的方法,应用于分布式建图系统,该方法包括:
接收第一电子设备发送的多帧第一环境图像和每帧第一环境图像的位姿信息;所述第一环境图像为所述第一电子设备对所处环境拍摄得到的,每帧第一环境图像的位姿信息用于表示所述第一电子设备在拍摄第一环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;接收第二电子设备发送的多帧第二环境图像和每帧第二环境图像的初始位姿信息,所述第二环境图像为所述第二电子设备对所处环境拍摄得到的,每帧第二环境图像的初始位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第二电子设备对应的三维坐标系中的位置和朝向;根据目标转换关系对所述多帧第二环境图像的初始位姿信息进行位姿转换,得到每帧第二环境图像的目标位姿信息,所述目标位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;所述目标转换关系为所述第二电子设备对应的三维坐标系与所述第一电子设备对应的三维坐标系之间的转换关系;根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地 图,所述三维地图用于构建增强现实场景。
在一个可能的设计中,所述方法还包括:在接收所述第一电子设备发送的所述多帧第一环境图像和每帧第一环境图像的位姿信息之前,接收所述第一电子设备发送的多设备建图请求,所述多设备建图请求中包括多帧第一初始图像以及每帧第一初始图像的定位信息,所述多帧第一初始图像为所述第一电子设备对所处环境拍摄得到的;在接收所述第二电子设备发送的所述多帧第二环境图像和每帧第二环境图像的初始位姿信息之前,接收所述第二电子设备发送的加入建图请求,所述加入建图请求中包括多帧第二初始图像,所述多帧第二初始图像为所述第二电子设备对所处环境拍摄得到的;根据所述多设备建图请求中的多帧第一初始图像以及每帧第一初始图像的定位信息生成初始三维地图;根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系。
在一个可能的设计中,在所述接收到所述多帧第二初始图像之后,根据所述多帧第二初始图像和所述初始三维地图确定目标转换关系之前,所述方法还包括:对所述多帧第二初始图像进行图像处理,确定至少一帧第二初始图像与任一帧第一初始图像包含相同的图像内容。
在一个可能的设计中,所述根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系,包括:提取目标初始图像的全局特征和特征点,所述目标初始图像为任一帧第二初始图像;根据目标初始图像的全局特征确定与目标初始图像匹配的至少一帧第一初始图像,并确定与目标初始图像匹配的至少一帧第一初始图像中的特征点在三维地图中对应的三维点,将确定出的三维点作为目标初始图像的特征点对应的三维点;根据目标初始图像的特征点、目标初始图像的特征点对应的三维点以及第二电子设备的相机内参确定所述目标初始图像的目标位姿信息;根据所述目标初始图像的初始位姿信息和所述目标初始图像的目标位姿信息确定所述目标转换关系。
在一个可能的设计中,所述方法还包括:根据所述初始三维地图中的三维点生成点云资源;接收所述第一电子设备发送定位请求,所述定位请求中包括所述第一电子设备采集到的环境图像;根据所述定位请求中的环境图像和所述初始三维地图对所述第一电子设备进行定位,确定所述第一电子设备在所述初始三维地图的三维坐标系中的位姿;向所述第一电子设备发送所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源,以使第一电子设备根据所述第一电子设备在所述初始三维地图的三维坐标系中的位姿显示所述第一电子设备实时采集的环境图像和所述点云资源,以表示所述点云资源覆盖的区域已完成扫描。
在一个可能的设计中,所述根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,包括:从所述多帧第一环境图像和所述多帧第二环境图像中选择待处理的一帧图像作为目标图像,并对所述目标图像进行目标处理过程,至所述多帧第一环境图像和所述多帧第二环境图像均已进行所述目标处理过程;
所述目标处理过程包括以下步骤:提取所述目标图像的第一特征点;获取已进行所述目标处理过程的至少一帧图像的特征点;在所述至少一帧图像的特征点中选择至少一个第二特征点与所述第一特征点组成特征匹配对;其中,所述第一特征点和所述至少一个第二特征点对应所述环境中的同一点;所述已进行所述目标处理过程的至少一帧图像包括至少一帧第一环境图像和/或至少一帧第二环境图像;
获取对所述多帧第一环境图像和所述多帧第二环境图像进行目标处理过程后得到的多个特征匹配对,并根据所述多个特征匹配对创建所述三维地图。
在一个可能的设计中,所述根据所述多个特征匹配对创建所述三维地图,包括:根据所述多帧第一环境图像的位姿信息和所述多帧第二环境图像的目标位姿信息,确定所述多个特征匹配对在所述第一电子设备对应的三维坐标系中对应的多个三维点,得到所述三维地图。
在一个可能的设计中,所述方法还包括:接收所述第一电子设备发送的每帧第一环境图像对应的定位信息;接收所述第二电子设备发送的每帧第二环境图像对应的定位信息;根据每帧第一环境图像对应的定位信息和每帧第二环境图像对应的定位信息对所述三维点的坐标进行调整,得到与真实环境等比例的三维地图。
在一个可能的设计中,所述方法还包括:接收所述第一电子设备发送的所述多帧第一环境图像对应的第一深度图;接收所述第二电子设备发送的所述多帧第一环境图像对应的第二深度图;根据所述目标转换关系对所述第二深度图进行坐标转换,对所述第一深度图和坐标转换后的第二深度图进行融合处理,得到完整深度图;根据所述完整深度图生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
在一个可能的设计中,所述方法还包括:基于多视图立体匹配算法确定每帧第一环境图像的深度信息和每帧第二环境图像的深度信息,根据每帧第一环境图像的深度信息和每帧第二环境图像的深度信息生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
在一个可能的设计中,所述方法还包括:将所述三维地图发送给所述第一电子设备和所述第二电子设备;接收所述第一电子设备发送的定位请求,确定所述第一电子设备在所述三维地图的三维坐标系中的第一位姿,将所述第一位姿发送给所述第一电子设备和所述第二电子设备;接收所述第二电子设备发送的定位请求,确定所述第二电子设备在所述三维地图的三维坐标系中的第二位姿,将所述第二位姿发送给所述第一电子设备和所述第二电子设备,以使第一电子设备根据所述第一位姿和所述三维地图显示增强现实场景,并根据所述第二位姿显示拍摄到的使用所述第二电子设备的用户的图像以及所述第二电子设备对应的三维数字资源模型。
第四方面,本申请提供一种电子设备,所述电子设备包括多个功能模块;所述多个功能模块相互作用,实现上述任一方面及其各实施方式中第一电子设备或第二电子设备所执行的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第五方面,本申请提供一种电子设备,包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储计算机程序指令,所述电子设备运行时,所述至少一个处理器执行上述任一方面及其各实施方式中第一电子设备或第二电子设备执行的方法。
第六方面,本申请实施例提供一种分布式建图系统,所述分布式建图系统包括多个计算节点,所述多个计算节点并行/串行处理数据,每个计算节点用于执行上述任一方面及其各实施方式中分布式执行系统所执行的方法。
第七方面,本申请还提供一种计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行上述任一方面及其各实施方式中第一电子设备、第二电子设备或分布式 建图系统执行的方法。
第八方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行上述任一方面及其各实施方式中第一电子设备、第二电子设备或分布式建图系统执行的方法。
第九方面,本申请还提供一种芯片,所述芯片用于读取存储器中存储的计算机程序,执行上述任一方面及其各实施方式中第一电子设备、第二电子设备或分布式建图系统执行的方法。
第十方面,本申请还提供一种芯片系统,该芯片系统包括处理器,用于支持计算机装置实现上述任一方面及其各实施方式中第一电子设备、第二电子设备或分布式建图系统执行的方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1为本申请实施例提供的一种AR场景示意图;
图2为本申请实施例提供的一种增强现实系统的示意图;
图3本申请实施例提供的一种电子设备的结构示意图;
图4为本申请实施例提供的一种电子设备的软件结构框图;
图5为本申请实施例提供的一种分布式建图系统的结构示意图;
图6为本申请实施例提供的一种用户操作电子设备触发建图初始化指令的示意图;
图7为本申请实施例提供的一种第一电子设备显示的扫描界面示意图;
图8为本申请实施例提供的一种多人建图初始化的示意图;
图9为本申请实施例提供的一种多人建图界面的示意图;
图10为本申请实施例提供的一种多人建图中多个用户持电子设备扫描拍摄当前所处环境的示意图;
图11为本申请实施例提供的一种电子设备在扫描拍摄当前所处环境时的显示界面示意图;
图12为本申请实施例提供的一种网格示意图;
图13为本申请实施例提供的一种第一电子设备显示的扫描进度界面的示意图;
图14为本申请实施例提供的一种三维地图的示意图;
图15为本申请实施例提供的一种多设备构建三维地图方法的流程示意图;
图16a为本申请实施例提供的一种包含三维数字资源模型的素材的AR场景界面示意图;
图16b为本申请实施例提供的一种添加三维数字资源模型后的AR场景界面的示意图;
图17为本申请实施例提供的一种电子设备显示的多用户在AR场景中交互游玩的示意图;
图18为本申请实施例提供的一种电子设备显示的已创建的三维地图列表的示意图;
图19为本申请实施例提供的一种用户触发扩展三维地图的示意图;
图20为本申请实施例提供的一种多设备创建三维地图的方法流程图。
具体实施方式
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。其中,在本申请实施例的描述中,以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
应理解,本申请实施例中“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一(项)个”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a、b或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c,或a、b和c,其中a、b、c可以是单个,也可以是多个。
增强现实(augmented reality,AR)是一种将真实世界信息和虚拟世界信息集成显示的技术。AR技术可以将原本在现实世界难以体验的实体信息进行模拟仿真得到虚拟信息,并将虚拟信息应用到真实世界,以使真实环境和虚拟物体实时叠加到同一个画面或空间同时被用户感知,以达到超越现实的感官体验。
AR场景的三维地图可以用于表示真实世界的环境信息,进而电子设备可以基于三维地图构建AR场景。例如,当用户通过电子设备的摄像装置实时拍摄真实世界时,用户可以在电子设备的显示屏中显示的电子场景中添加虚拟物品,电子设备基于真实世界对应的三维地图将虚拟物体添加到AR场景中,用户可以在同一个画面中观察到真实世界和用户添加的虚拟物品。例如,图1为本申请实施例提供的一种AR场景示意图。参考图1,图1中地面、道路为电子设备的摄像装置实际拍摄到的真实世界的画面,道路上的卡通人物则为用户在当前的AR场景中添加的虚拟物品,用户可以在电子设备的显示屏上同时观察到真实世界中的地面、道路和虚拟的卡通人物。
一种可选的实施方式中,可以由多个电子设备共同创建三维地图。具体的,每个电子设备都可以对真实世界的同一区域采集环境图像,并生成该区域对应的三维点云,如第一电子设备生成的三维点云为目标点云,第二电子设备生成的三维点云为参考点云。第一电子设备可以基于主方向贴合法预配准目标点云和参考点云,求取目标点云和参考点云中各点的曲率,并根据曲率相似分别得到特征匹配点对。利用特征匹配点对,实现目标点云和参考点云的精确配准,进而生成三维地图。通过该方式,可以将不同电子设备采集的环境图像得到的三维点云进行合并,实现多个电子设备共同创建三维地图,但该方法要求多个电子设备采集的环境图像的重叠率较高,进而导致生成三维地图的效率较低。
基于以上问题,本申请实施例提供一种三维地图创建方法,用于提供一种高效的多设备协同创建三维地图的方法。
图2为本申请实施例提供的一种增强现实系统的示意图。参考图2,该增强现实系统中包括第一电子设备、至少一个第二电子设备和分布式建图系统。其中,第一电子设备为主设备,第二电子设备为从设备。分布式建图系统可以部署在至少一个云端服务器中。作为示例而非限定,图2中以一个第一电子设备、一个第二电子设备以及部署在一个云端服 务器中的分布式建图系列为例示出。
第一电子设备可以启动多设备构建地图任务,第二电子设备可以加入多设备构建地图任务。第一电子设备和第二电子设备可以同时对真实世界的环境图像进行采集,第一电子设备将采集到的多帧第一环境图像上传到分布式建图系统,第二电子设备将采集到的多帧第二环境图像上传至分布式建图系统。分布式建图系统可以确定多帧第二环境图像对应的目标转换关系,该目标转换关系为表示第二环境图像对应的三维坐标系和第一环境图像对应的三维坐标系之间的转换关系。分布式建图系统根据目标转换关系对第二环境图像的位姿信息进行位姿转换,得到第二环境图像的目标位姿信息。分布式系统可以根据第一环境图像和位姿转换后的第二环境图像生成三维地图,该三维地图可以用于构建AR场景。通过该方案,分布式建图系统在接收到多个电子设备上传的环境图像后,可以将不同电子设备上传的环境图像的位姿转换为同一三维坐标系下的位姿,分布式建图系统可以根据转换后的环境图像生成三维地图,进而实现多设备构建三维地图。
下面介绍电子设备、分布式建图系统和用于这样的电子设备和分布式建图系统的实施例。本申请实施例的电子设备可以具有摄像装置和显示装置,例如本申请实施例的电子设备可以为平板电脑、手机、车载设备、增强现实(augmented reality,AR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、可穿戴设备等,本申请实施例对电子设备的具体类型不作任何限制。
图3为本申请实施例提供的一种电子设备100的结构示意图。如图3所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。充电管理模块140用于从充电器接收充 电输入。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
显示屏194用于显示应用的显示界面,例如显示电子设备100上安装的应用的显示页面等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管 (quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。在本申请实施例中,显示屏194可以用于显示AR场景,显示屏194中显示的AR场景可以包括摄像头193实时拍摄得到的图像以及用户在AR场景中放置的虚拟物品。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。在本申请实施例中,摄像头193可以采集用于构建AR场景的三维地图的图像,摄像头193还可以用于拍摄全景图像,如用户持电子设备100水平旋转360度,摄像头193可以采集到一张电子设备100所处位置对应的全景图。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,以及至少一个应用程序的软件代码等。存储数据区可存储电子设备100使用过程中所产生的数据(例如拍摄的图像、录制的视频等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将图片,视频等文件保存在外部存储卡中。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
其中,传感器模块180可以包括压力传感器180A,加速度传感器180B,触摸传感器180C等。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。
触摸传感器180C,也称“触控面板”。触摸传感器180C可以设置于显示屏194,由触摸传感器180C与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180C用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180C也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195 用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现与电子设备100的接触和分离。
可以理解的是,图3所示的部件并不构成对电子设备100的具体限定,电子设备还可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。此外,图3中的部件之间的组合/连接关系也是可以调整修改的。
图4为本申请实施例提供的一种电子设备的软件结构框图。如图4所示,电子设备的软件结构可以是分层架构,例如可以将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将操作系统分为四层,从上至下分别为应用程序层,应用程序框架层(framework,FWK),运行时(runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包(application package)。如图4所示,应用程序层可以包括相机、设置、皮肤模块、用户界面(user interface,UI)、三方应用程序等。其中,三方应用程序可以包括图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预先定义的函数。如图4所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
运行时包括核心库和虚拟机。运行时负责操作系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是操作系统的核心库。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(media  libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
硬件层可以包括各类传感器,例如加速度传感器、陀螺仪传感器、触摸传感器等。
图5为本申请实施例提供的一种分布式建图系统的结构示意图。参考图5,本申请实施例中分布式建图系统可以包括多个计算节点(如图5所示的计算节点1~计算节点N,N为正整数)、至少一个存储节点、任务队列节点、至少一个调度节点以及定位节点。可选地,分布式建图系统可以部署在云端,如分布式建图系统的多个节点可以部署在一个云端服务器中,或者多个节点可以部署在多个云端服务器中。下面对图5所示的分布式建图系统中各个节点的功能进行介绍:
计算节点,用于基于分布式处理方式,根据多个电子设备上传的对真实世界的环境拍摄的多帧图像以及每帧图像对应的定位参数,创建该环境对应的三维地图。其中,不同计算节点可以执行三维地图创建过程中的不同处理任务,N个计算节点共同完成整个三维地图的创建过程。例如,不同计算节点可分别对不同的图像进行相同类型的处理,从而将多帧图像的处理任务分散到多个计算节点中同步进行,进而加快图像处理的速度。
参考图5,N个计算节点可以包括图5中所示的CPU算法组件和GPU算法组件。其中,分布式建图系统中的CPU算法组件可以有多个,GPU算法组件也可以有多个。GPU算法组件可以用于对多个电子设备上传的多帧图像进行图像处理(如特征提取、匹配、检索等),CPU算法组件可以用于将不同电子设备上传的环境图像的位姿转换至同一三维坐标系下,还可以用于根据GPU算法组件的图像处理结果,生成三维地图。GPU算法组件和CPU算法组件可以队列形消息中间件中的地图构建指令并进行算法自动处理。
本申请实施例中计算节点中还可以包括白模处理服务,白模处理服务用于对电子设备上传的网格进行简化,并根据简化后的网格生成白模。白膜处理服务还可以根据多个电子设备上传的深度图生成AR场景对应的完整白膜。
当然,计算节点也可以通过其它类型的算法处理组件实现,本申请实施例中不做具体限制。
任务队列节点,用于按队列缓存三维地图创建过程中的处理任务,每个计算节点可以从任务队列节点读取待执行的任务后进行相应处理,从而实现多处理任务的分布式按序执行。
示例性的,任务队列节点可以利用图5中所示的队列形消息中间件实现。该队列形消息中间件可以用于异步缓存来自多个电子设备的三维地图创建指令、三维地图创建过程中的处理任务的指令等,并可以共享或分配给N个计算节点,以使N个计算节点分担执行任务,均衡系统负载。
至少一个存储节点,用于对三维地图创建过程相关的数据进行临时存储或永久性存储。例如,至少一个存储节点可以存储多帧图像、多个计算节点进行相应处理的中间数据和结果数据等。
可选的,参考图5,存储节点可以包括云端数据库、对象存储服务、弹性文件服务、缓存形消息中间件等。其中,云端数据库可以用于存储电子设备侧的用户信息、创建三维地图过程中任务处理情况的指示信息、对三维地图的修改信息等占用较小存储空间的序列化内容。对象存储服务可以用于存储电子设备中涉及的三维模型、高清图片、视频、动画等占用较大存储空间的非序列化内容。弹性文件服务可以用于存储利用三维地图创建算法所生成的三维地图的地图数据、以及占用存储空间较大的算法的中间变量等数据。缓存形消息中间件可以用于异步缓存算法处理过程中的可序列化且占用存储空间较小的中间变量等数据,并可以共享给N个计算节点。
至少一个调度节点,用于对N个计算节点、任务队列节点、至少一个存储节点中的部分或全部节点的调度进行统筹管理。
示例性的,如图5中所示,分布式建图系统中的调度节点可以包括云端调度中心和算法调度中心。其中,云端调度中心可以对算法调度中心、存储节点、任务队列节点等节点进行管理和调度,并可以与电子设备进行信息和数据交互,可以作为高效的消息处理及分发节点,例如,云端调度中心能够向电子设备提供多帧图片的上传地址,进行电子设备侧的请求调度,云端数据库的请求及返回等。算法调度中心用于对N个计算节点进行管理和调度,还可以对其它的一些算法服务进行管理和调度。
定位节点,用于根据电子设备上传的图像对电子设备进行定位,以确定电子设备在三维地图的坐标系中的相对位置。
可选地,定位节点可以包括全局视觉定位系统(global visual positioning system,GVPS)服务和向量检索系统(vector retrieval system,VRS)服务。其中,GVPS服务可以用于进行空间定位,确定电子设备当前所处位置在创建的三维地图中对应位置的6自由度坐标。VRS服务用于进行向量搜索。可选的,GVPS服务和VRS服务可以作为计算节点的子服务。
关于上述系统中各节点、服务或组件的具体功能,下文中会结合具体实施例进行说明,这里暂不详述。
需要说明的是,图5所示的分布式建图系统仅是对本申请实施例提供的分布式建图系统的一种示例性说明,并不对本申请实施例提供的方案适用的分布式建图系统的架构造成限制。本申请实施例提供的方案适用的分布式建图系统与图5所示的结构相比,也可以增加、删除或调整部分节点,本申请实施例中不进行具体限定。
下面结合具体实施例,对本申请实施例提供的方案进行说明。
本申请实施例提供的方案的执行过程至少包括建图初始化、数据采集、建图三个阶段。基于这三个阶段的方法创建三维地图后,还可以进一步包括定位、添加数字资源等阶段。下面分别对各个阶段的方法进行详细说明。
一、建图初始化
本申请实施例中,第一电子设备可以向分布式建图系统发送三维地图的建图初始化指令。第一用户可以操作第一电子设备,触发第一电子设备启动创建三维地图,第一电子设备向分布式建图系统的调度节点发送建图初始化指令。分布式建图系统的调度节点接收到 建图初始化指令后,可以为当前建图任务所要创建的三维地图分配地图标识(identity document,ID)并指示给电子设备。其中,分布式建图系统通过分配地图ID并指示给电子设备,可以对不同的三维地图进行统一管理,并可以与电子设备同步三维地图的信息,避免电子设备与分布式建图系统的信息不一致导致后续的地图处理或使用过程中出现问题。
例如,图6为本申请实施例提供的一种用户操作电子设备触发建图初始化指令的示意图。参考图6,电子设备在显示屏中显示初始化控制界面,该界面中显示有用于触发建图流程的控件,还可以显示用于指示触发建图的方式的提示信息,例如“点击按钮开始录制”,则第一用户根据该提示信息,通过点击电子设备显示屏中显示的控件来触发建图初始化指令。
第一电子设备在接收到调度节点发送的地图标识后,向分布式建图系统发送多设备建图请求。多设备建图请求中包括第一电子设备采集到的多帧第一初始图像以及每帧第一初始图像的定位信息。实施中,第一电子设备可以通过摄像装置对当前环境进行拍摄,并将拍摄得到的多帧第一初始图像发送给分布式建图系统,分布式建图系统的计算节点可以根据多帧第一初始图像以及每帧第一初始图像的定位信息生成初始三维地图。该初始三维地图可以用于对第一电子设备和第二电子设备进行定位。
需要说明的是,分布式建图系统根据多帧第一初始图像生成初始三维地图的方式与分布式建图系统根据第一电子设备采集的多帧第一环境图像和第二电子设备采集的多帧第二环境图像生成目标三维地图的方式相同,此处暂不详述,具体可以参见本申请实施例中数据采集阶段以及建图阶段的描述。
分布式建图系统在生成初始三维地图后,可以将初始三维地图发送给第一电子设备。第一电子设备在接收到初始三维地图后,可以向分布式建图系统发送定位请求,该定位请求中包括当前第一电子设备的摄像装置采集到的环境图像,分布式建图系统可以根据定位请求中的环境图像和初始三维地图对第一电子设备进行定位,确定第一电子设备在初始三维地图中的位姿。分布式建图系统向第一电子设备返回第一电子设备在初始三维地图中的位姿以及环境图像对应的点云资源。其中,点云资源为分布式建图系统在生成三维地图过程中确定的多帧第一初始图像中的特征点对应的三维点。第一电子设备在接收到点云资源后,可以在显示屏中将点云资源叠加显示到实时拍摄的环境图像中,以表示第一电子设备当前已完成扫描的区域。
例如,图7为本申请实施例提供的一种第一电子设备显示的扫描界面示意图。第一电子设备接收到分布式建图系统发送的点云资源后,将点云资源叠加显示到当前显示屏中显示的图像中点云资源对应的位置处。
一些实施例中,当第一电子设备显示叠加点云资源的环境图像时,第一电子设备可以获取包括点云资源和环境图像的视频流,并将该视频流发送给第二电子设备。第二电子设备在接收到视频流后可以将该视频流对应的视频播放给第二用户,第二用户为操作第二电子设备的用户。第二用户在查看视频后,可以获知第一电子设备已扫描的区域;或者第二用户可以查看第一用户的第一电子设备,获知第一电子设备已扫描的区域。
第二用户在获知第一电子设备已扫描的区域后,可以持第二电子设备移动到第一电子设备已扫描的区域,第二电子设备通过第二电子设备的摄像装置拍摄第一电子设备已扫描的区域,得到至少一帧第二初始图像。第二电子设备可以向分布式建图系统发送加入建图 请求,该加入建图请求中包括第二电子设备采集到的多帧第二初始图像以及每帧第二初始图像的定位信息。其中,至少一帧第二初始图像包含任一帧第一初始图像中的图像内容。也就是说,第二用户可以通过操作第二电子设备拍摄第一电子设备拍摄过的环境的图像,以加入多设备建图任务。分布式建图系统接收到第二电子设备发送的加入建图请求后,计算节点可以对多帧第二初始图像进行图像处理,确定至少一帧第二初始图像与任一帧第一初始图像包含相同的图像内容,则分布式建图系统将地图标识发送给第二电子设备,第二电子设备加入第一电子设备的建图任务。
举例来说,图8为本申请实施例提供的一种多人建图初始化的示意图。第一电子设备采集多帧第一初始图像,如其中一帧第一初始图像为第一用户操作第一电子设备拍摄当前环境中的目标物品得到的。第二用户可以操作第二电子设备拍摄目标物品得到第二初始图像。第一电子设备将多帧第一初始图像发送给分布式建图系统,分布式建图系统根据多帧第一初始图像生成初始三维地图。第二电子设备将多帧第二初始图像发送给分布式建图系统,分布式建图系统对多帧第二初始图像进行图像处理,确定第二初始图像中包含目标物品,则分布式建图系统将地图标识发送给第二电子设备。
由于第一电子设备的相机坐标系与第二电子设备的相机坐标并不相同,分布式建图系统在接收到多帧第二初始图像后,可以根据初始三维地图和多帧第二初始图像确定多帧第二初始图像相对于初始三维地图的目标位姿信息,并根据多帧第二初始图像的初始位姿信息和目标位姿信息确定第二电子设备对应的三维坐标系与初始三维地图的三维坐标系之间的目标转换关系;其中,第二初始图像的初始位姿信息可以为第二电子设备拍摄第二初始图像时运行同步地图创建与定位(simultaneous localization and mapping,SLAM)算法得到的;第二电子设备对应的三维坐标系可以为第二电子设备在运行SLAM确定图像的位姿信息时创建的三维坐标系;初始三维地图的三维坐标系为第一电子设备对应的三维坐标系,可以为第一电子设备在运行SLAM确定图像的位姿信息时创建的三维坐标系。初始位姿信息所指示的位姿为第二电子设备在拍摄第二初始图像时在第二电子设备对应的三维坐标系中的位姿,目标位姿信息所指示的位姿为第二电子设备在拍摄第二初始图像时相对于初始三维地图的三维坐标系的位姿。分布式建图系统可以根据目标转换关系将第二电子设备上传的图像的位姿转换为在第一电子设备对应的三维坐标系中的位姿,进而可以根据第二电子设备上传的图像和第一电子设备上传的图像构建三维地图。
具体实施中,分布式建图系统的计算节点对每帧第二初始图像进行图像处理,对图像中各个区域的多尺度的灰度特征提取特征向量,得到图像的局部特征,并提取得到图像中的特征点。其中,特征向量可以用于表示图像中局部区域的纹理特征。计算节点确定第二初始图像中的特征点对应的第一初始图像中的特征点,根据初始三维地图确定第一初始图像中的特征点对应的三维点,则可以确定该三维点为第二初始图像中的特征点对应的三维点。计算节点可以根据第二初始图像中的特征点、特征点对应的三维点以及第二电子设备的摄像装置的相机内参求解第二初始图像的目标位姿信息。计算节点可以根据多帧第二初始图像的初始位姿信息和目标位姿信息确定第二初始图像对应的三维坐标系与初始三维地图的坐标系之间的转换关系。如计算节点可以根据每帧第二初始图像的初始位姿信息和目标位姿信息确定转换矩阵,在根据多帧第二初始图像确定出多个转换矩阵后,对多个转换矩阵进行计算平均数得到目标转换矩阵,该目标转换矩阵为第二初始图像对应的三维坐标系与初始三维地图的三维坐标系之间的目标转换关系。
一些实施方式中,第二电子设备在接收到分布式建图系统发送的地图标识后,还可以向分布式建图系统发送定位请求,该定位请求中包括当前第二电子设备的摄像装置采集到的环境图像,分布式建图系统可以根据目标转换关系对环境图像进行位姿转换,并根据位姿转换后的环境图像对第二电子设备进行定位,确定第二电子设备在初始三维地图中的位姿。分布式建图系统向第二电子设备返回第二电子设备在初始三维地图中的位姿以及环境图像对应的点云资源,其中,点云资源为分布式建图系统对第二电子设备上传的环境图像进行特征提取后得到的特征点对应的三维点。第二电子设备在接收到点云资源后,可以在显示屏中将点云资源叠加显示到实时拍摄的环境图像中,以表示当前已完成扫描的区域。
第二电子设备加入多人建图任务后,第一电子设备和第二电子设备可以显示如图9所示的多人建图界面,并在该多人建图界面中显示所有参与多人建图任务的用户标识。
通过以上方法,分布式建图系统可以确定第二电子设备采集的图像对应的三维坐标系和第一电子设备采集的图像对应的三维坐标系之间的转换关系,从而可以根据该转换关系将不同电子设备采集的图像的位姿转换为同一三维坐标系下的位姿,以便分布式建图系统根据多个电子设备采集的图像进行三维地图的构建。
二、数据采集
第一电子设备触发上述建图初始化流程后,第一电子设备可以采集所在环境的多帧第一环境图像及每帧第一环境图像对应的定位信息,并将采集到的第一环境图像上传到分布式建图系统。同样的,第二电子设备也可以采集所在环境的多帧第二环境图像及每帧第二环境图像对应的定位信息,并将采集到的第二环境图像上传到分布式建图系统。分布式建图系统可以分别对第一电子设备上传的多帧第一环境图像和第二电子设备上传的多帧第二环境图像进行图像处理,确定每帧图像的特征信息,以进一步创建所述环境对应的三维地图。该过程主要包括以下步骤1~4:
步骤1:第一电子设备和第二电子设备分别扫描拍摄所在的环境的视频。
实施中,第一电子设备和第二电子设备可以分别对所在的环境进行扫描拍摄,并将摄像装置当前拍摄到的环境图像实时显示在显示屏上,并且显示用于提示用户继续扫描的提示信息。用户在操作电子设备进行扫描时,可以按照显示屏上的提示信息,持电子设备移动持续扫描当前所处的环境,电子设备在接收到用户触发的结束扫描指令后,停止扫描。
例如,图10为本申请实施例提供的一种多人建图中多个用户持电子设备扫描拍摄当前所处环境的示意图。参见图10,两个用户可以持不同的电子设备对当前环境进行扫描拍摄。
图11为本申请实施例提供的一种电子设备在扫描拍摄当前所处环境时的显示界面示意图。参考图11,电子设备在显示屏中实时显示摄像装置拍摄到的环境图像,当用户确定停止扫描时,可以点击图11中所示的触发停止扫描的控件,则电子设备可以停止扫描,并根据已扫描的内容生成视频文件。
在多人建图任务中,第一电子设备和第二电子设备可以对相同环境中的不同区域进行扫描拍摄,从而实现多设备协同采集环境图像,增加用户之间互动性的同时提高环境图像的采集效率。
步骤2:第一电子设备从拍摄到的视频中提取满足关键帧要求的多帧第一环境图像, 并将多帧第一环境图像上传至分布式建图系统;第二电子设备从拍摄到的视频中提取满足关键帧要求的多帧第二环境图像,并将多帧第二环境图像上传至分布式建图系统。
在本申请实施例中,第一电子设备和第二电子设备在对当前所处环境拍摄得到视频后,从视频中提取满足关键帧要求的环境图像,并将环境图像上传至分布式建图系统。该步骤中第一电子设备与第二电子设备所执行的功能相同,为便于描述,下面以第一电子设备为例对步骤2的具体内容进行介绍:
第一电子设备在扫描环境过程中,可以通过运行SLAM算法,获取每帧第一环境图像的位姿信息。每帧第一环境图像的位姿信息为第一电子设备在拍摄该帧图像时的位姿在目标三维地图的三维坐标系中对应的位姿,其中目标三维地图为分布式建图系统根据第一电子设备和第二电子设备采集的环境图像生成的三维地图。目标三维地图的三维坐标系与建图初始化中分布式建图系统生成的初始三维地图的三维坐标系相同。
第一电子设备在获取视频后,可以采用如下任一种方式从视频中选择满足关键帧要求的第一环境图像:
1)根据视频中各帧图像对应的位姿之间的变化约束关系选择满足关键帧要求的第一环境图像。
在该方式中,针对视频中的每帧图像,第一电子设备获取采集到该帧图像时的位姿信息,将该位姿信息与采集到前一帧满足关键帧要求的图像时的位姿信息进行对比。若确定两个位姿信息所指示的位姿之间的偏移量大于设定的偏移量阈值,则确定该帧图像为满足关键帧要求的第一环境图像,否则,确定该帧图像不满足关键帧要求,继续进行下一帧图像的判断,直至视频中所有图像均已确定是否满足关键帧要求,从而选择出满足关键帧要求的第一环境图像。
2)根据视频中各帧图像的局部特征选择满足关键帧要求的第一环境图像。
在该方式中,第一电子设备可以提取各帧图像的局部特征,并根据提取各帧图像的局部特征确定各帧图像中的特征点,然后利用光流跟踪法对特征点进行跟踪,根据对特征点的跟踪情况选择满足关键帧要求的图像。其中,利用光流跟踪法可以确定当前帧图像中的特征点是否存在于下一帧图像中,因此基于光流跟踪法可以判断两帧图像中包含的相同特征点的数量。针对视频中的每帧图像,第一电子设备提取该帧图像中的特征点后,确定该帧图像与前一帧满足关键帧要求的图像包含的相同特征点的数量,若该数量大于设定的数量阈值或者该数量与该帧图像中所有特征的数量之比小于设定的数量阈值,则确定该帧图像为满足关键帧要求的第一环境图像,否则,确定该帧图像不满足关键帧要求,继续进行下一帧图像的判断,直至视频中所有图像均已确定是否满足关键帧要求,从而选择出满足关键帧要求的第一环境图像。
可选的,上述两种方式中,第一电子设备可以将视频中的第一帧图像作为第一个满足关键帧要求的第一环境图像,从而基于该图像,在其余帧图像中继续选择满足关键帧要求的第一环境图像。
第一电子设备从视频中选择满足关键帧要求的多帧第一环境图像后,将多帧第一环境图像上传到分布式建图系统,分布式建图系统可以对多帧第一环境图像进行存储和管理,并可以对多帧第一环境图像进行图像处理。
例如,基于图5所示的分布式建图系统,第一电子设备在选择满足关键帧要求的第一环境图像后,向云端调度中心发送图像传输请求,以请求上传图像。云端调度中心接收到 图像传输请求后,向电子设备返回上传图像的URL,然后电子设备根据URL将第一环境图像上传到对象存储服务进行存储。
可选地,第一电子设备在上传满足关键帧要求的第一环境图像时,可以采用逐帧上传的方式,即电子设备每选择到一帧满足关键帧要求的第一环境图像,就将该第一环境图像上传至分布式建图系统,同时继续进行下一帧满足关键帧要求的图像的选择过程。当然,电子设备也可以选择得到所有满足关键帧要求的第一环境图像后,再将这些第一环境图像一并上传至分布式建图系统。
步骤3:第一电子设备采集每帧第一环境图像对应的定位信息并上传至分布式建图系统;第二电子设备采集每帧第二环境图像对应的定位信息并上传至分布式建图系统。
在本申请实施例中,第一电子设备在向分布式建图系统上传第一环境图像时,还可以获取第一环境图像对应的定位信息,并将第一环境图像对应的定位信息也上传至分布式建图系统。该步骤中第一电子设备与第二电子设备所执行的功能相同,为便于描述,下面以第一电子设备为例对步骤2的具体内容进行介绍:
每帧第一环境图像对应的定位信息包括第一电子设备采集得到该帧第一环境图像时的位姿信息、全球定位系统(global positioning system,GPS)信息和惯性测量单元(inertial measurement unit,IMU)信息。其中,位姿信息为第一电子设备拍摄得到该帧第一环境图像时利用SLAM算法测得的。GPS信息用于指示第一电子设备拍摄得到该帧第一环境图像时通过进行GPS定位确定的位置。IMU信息用于指示第一电子设备拍摄得到该帧第一环境图像时基于IMU传感器测量到的第一电子设备的姿态特征。
示例性的,第一电子设备可以将采集到的定位信息以元(meta)数据的形式上传至分布式建图系统。基于图5所示的分布式建图系统,电子设备可以将元数据发送至云端调度中心,云端调度中心接收到元数据后,将元数据发送至缓存形消息中间件进行缓存,以供CPU算法组件或GPU算法组件使用。同时,云端调度中心可以将元数据存储至弹性文件服务。
步骤4:分布式建图系统分别针对第一电子设备上传的每帧第一环境图像和第二电子设备上传的每帧第二环境图像进行图像处理。
分布式建图系统在接收到第一电子设备上传的多帧第一环境图像和第二电子设备上传的多帧第二环境图像后,需要对第一环境图像和第二环境图像进行图像处理。具体的,每个计算节点可以从多帧第一环境图像或第二环境图像中选择一帧未经处理的图像,并对该帧图像进行图像处理过程,在处理完毕后,继续选择下一帧未经处理的图像进行图像处理过程,直至确定所有第一环境图像和第二环境图像均已被处理完毕。计算节点从多帧第一环境图像或第二环境图像中选择一帧图像时可以采用随机选择的方式,也可以按照多帧图像的顺序(例如图像被上传至分布式建图系统的顺序)进行选择。
一种可选的实施方式中,当计算节点在对第一环境图像进行图像处理时,可以直接对第一环境图像进行特征提取和序列化处理过程。当计算节点在对第二环境图像进行图像处理之前,可以先根据建图初始化过程中确定出的目标转换关系对第二环境图像对应的位姿信息进行转换,以确定第二环境图像对应的目标位姿信息,第二环境图像对应的目标位姿信息为第二电子设备拍摄第二环境图像时相对于目标三维地图的三维坐标系的位姿。通过 对第二环境图像对应的位姿信息进行转换,可以使得第二环境图像和第一环境图像对应的位姿信息为相同坐标系下的位姿信息。计算节点对第二环境图像对应的位姿信息进行转换得到第二环境图像对应的目标位姿信息的过程,也可以称为将第二环境图像注册到第一环境图像所属的图像序列中。
下面对计算节点对第一环境图像或第二环境图像进行的图像处理过程进行介绍,实施中,计算节点对第一环境图像和第二环境图像进行的图像处理流程相同,为便于介绍,以计算节点对第一环境图像进行图像处理为例进行介绍,该图像处理过程包括以下步骤A1~A2:
A1:特征提取:计算节点提取第一环境图像的全局特征。
该步骤中,计算节点可以对第一环境图像进行局部特征提取和全局特征提取。其中,在进行局部特征提取时,计算节点可以对第一环境图像中各个区域的多尺度的灰度特征提取特征向量,得到第一环境图像的局部特征,并提取得到第一环境图像中的特征点。其中,特征向量可以用于表示第一环境图像中局部区域的纹理特征。在进行全局特征提取时,计算节点可以利用已训练的网络模型对图像中特征不变性较好(例如满足设定要求)的区域的局部特征进行聚类,并计算各个局部特征与聚类中心的加权残差和,得到第一环境图像的全局特征。其中,全局特征可以用于表征第一环境图像的整体结构特征。
A2:序列化处理:计算节点根据第一环境图像的全局特征,从已处理图像中选择与该图像匹配的图像。
该步骤包括特征检索、特征匹配以及特征校验。其中,特征检索指计算节点根据该第一环境图像的全局特征,对从已处理图像(即已进行过上述的图像处理过程的图像,包括已进行过上述的图像处理过程的第一环境图像和第二环境图像)的全局特征进行检索,得到与该第一环境图像的全局特征距离最接近的设定数量个全局特征,并将检索得到的全局特征对应的图像作为候选帧图像。可选的,电子设备还可以同时采集时间早于该第一环境图像的采集时间且与该第一环境图像的采集时间最接近的设定数量帧第一环境图像作为候选帧图像。特征匹配指计算节点将候选帧图像中的局部特征与该第一环境图像的局部特征进行匹配,从中选取满足一定阈值条件的N个匹配对。其中,在进行匹配时,计算节点可以利用最近邻(k-nearest neighbor,KNN)匹配算法从候选帧图像局部特征点中选择与该第一环境图像中局部特征点匹配的特征点,并与该第一环境图像中局部特征点组成匹配对。计算节点也可以通过训练深度学习模型后利用深度学习模型进行匹配的方式选择匹配对。特征校验指计算节点从特征匹配处理的结果中滤除错误匹配的信息。可选的,计算节点可以采用随机抽样一致性校验等算法进行特征校验处理。
基于以上图像处理过程,每个计算节点可以确定自身处理的图像与已处理的其它图像之间的匹配对关系(匹配或不匹配),因此,在所有第一环境图像和第二环境图像的图像处理过程结束后,可以得到所有图像之间的匹配对关系。其中,同一组匹配对的N个特征点对应三维地图中的同一三维点。
示例性的,基于图5中所示的分布式建图系统,第一电子设备根据URL将多帧第一环境图像上传到对象存储服务后,云端调度中心可以向队列形消息中间件发送传图消息以指示每帧图像对应的传图任务,队列形消息中间件对传图任务的信息进行缓存。各GPU算法组件分别从队列形消息中间件中读取传图任务,并从对象存储服务读取传图任务对应的第一环境图像后,对读取的第一环境图像进行上述的图像处理过程,并将处理结果(即 图像的匹配信息)保存至弹性文件服务中,同时把处理完成的标识符以及处理过程的中间结果(例如图像的全局特征等)发送到缓存形消息中间件进行缓存。则后续GPU算法节点进行图像处理过程中可以从缓存形消息中间件读取已处理图像的全局特征,以便进行序列化处理等。
需要说明的是,上述部分步骤的执行顺序并无严格的时序要求,可以根据实际情况进行调整。例如,对于同一计算节点来说,上述步骤3、4的执行依赖于上述步骤2中选取的图像,但是,上述步骤3、4之间可以无序,即该计算节点在执行上述步骤3、4时,可以先执行其中任一步骤,再执行另一步骤,也可以同时执行两个步骤。对于不同计算节点来说,每个计算节点执行上述步骤1~4的过程独立于其它计算节点,任意两个计算节点之间互不干扰。
在本申请一些实施例中,第一电子设备和第二电子设备在扫描拍摄当前所处环境时,还可以在显示屏上显示覆盖环境轮廓的网格,以提示和引导用户完成扫描过程。具体的,电子设备可以采用飞行时间(time of flight,TOF)方法采集满足关键帧要求的图像的深度图,或者,根据选择的满足关键帧要求的图像,采用多视立体匹配(multi-view stereo,MVS)得到对应的深度图。电子设备得到每帧图像的深度图后,可以采用基于截断的带符号距离函数(truncated signed distance function,TSDF)等算法,根据每帧图像提取体素并确定三维体素空间中各个体素的深度值。得到体素后,电子设备可以根据各个体素的深度值,利用移动立方体(marching cubes)算法将体素转换为网格并进行渲染,然后显示在显示屏所示的环境界面中的对应区域。
例如,当电子设备对图11所示的环境进行扫描时,电子设备可以对图11所示的界面对应的图像的深度图进行体素提取和网格转换得到网格。电子设备将该网格覆盖到图11所示的显示界面中的对应位置后,可以显示图12所示的界面,该界面中,网格覆盖区域为已经扫描过的区域,未被网格覆盖的区域为待扫描区域或者无法生成对应网格的区域。基于此方式,电子设备可以在用户操作电子设备扫描环境空间过程时,将已扫描和未扫描过的区域实时呈现给用户,来引导用户根据网格提示继续操作电子设备针对未扫描的环境区域进行扫描,使得网格尽可能多的覆盖待扫描的真实环境空间中的三维物体,从而简便快速的完成扫描过程,降低采集环境图像的操作难度,提高用户的使用体验。
本申请一些实施方式中,第一电子设备和第二电子设备在结束扫描后,可以将扫描过程中得到深度图上传至分布式建图系统。分布式建图系统在接收到第一电子设备上传的第一深度图和第二电子设备上传的第二深度图后,可以根据建图初始化中确定的目标转换关系对第二深度图对应的坐标系进行转换,并对转换坐标系后的第二深度图和第一深度图进行融合处理,得到当前环境对应的完整深度图。分布式建图系统可以根据当前环境对应的完整深度图生成当前环境对应的白膜,当前环境对应的白膜可以用于表示当前环境中的各物体的表面。实施中,分布式建图系统可以通过平面提取、相交计算、多面体拓扑构建及表面优化等算法生成当前环境对应的白模。
示例性的,基于图5中所示的分布式建图系统,第一电子设备上传第一深度图时,可以发送深度图上传请求至云端调度中心,云端调度中心接收深度图上传请求后,向第一电子设备返回深度图上传地址的信息。第一电子设备可以根据深度图上传地址将第一深度图上传到对象存储服务中进行存储。第一电子设备上传第一深度图完毕后可以向云端调度中 心发送深度图上传完毕通知消息。同样的,第二电子设备也可以将第二深度图上传到对象存储服务中存储,并向云端调度中心发送深度图上传完毕通知消息。云端调度中心在接收到该通知消息后可以发送白膜创建任务至队列形消息中间件。CPU算法组件中的白模处理服务在监听队列形消息中间件中缓存的地图标识对应的所有白膜创建任务后,CPU算法组件领取任务后,根据目标转换关系对第二电子设备上传的第二深度图进行坐标转换,并对转换后的第二深度图和第一深度图进行融合处理,得到当前环境对应的完整深度图,再根据完整深度图生成当前场景对应的白膜。白模处理服务执行白膜创建任务得到的结果可以发送至弹性文件存储服务进行存储,同时发送对应的白膜创建完成通知消息至队列形消息中间件。云端调度中心监听到该白膜创建完成通知消息后可以从弹性文件服务中获取当前场景对应的白膜,并将白膜发送至对象存储服务进行存储,同时将白膜存储至云端数据库。
本申请另一些实施方式中,分布式建图系统在接收到第一电子设备上传的多帧第一环境图像和第二电子设备上传的多帧第二环境图像后,计算节点还可以基于多视图立体匹配(multi-view stereo,MVS)算法确定每帧第一环境图像的深度信息和每帧第二环境图像的深度信息,每帧图像的深度信息包括该帧图像中每个像素点的深度值。分布式建图系统可以根据每帧第一环境图像的深度信息和每帧第二环境图像的深度信息生成当前环境对应的白膜。
当分布式建图系统完成三维地图的建图任务后,用户操作电子设备在AR场景中游玩时,用户在电子设备的显示屏中显示的AR场景中放置虚拟物品时,分布式建图系统可以基于该AR场景对应的白膜,将虚拟物品放置到电子设备显示的当前环境中的物体的表面上,使得电子设备显示AR场景更真实,交互性也更强。
三、建图
第一电子设备在结束扫描后,第一电子设备可以向分布式建图系统发送建图指令。分布式建图系统在接收到第一电子设备发送的建图指令后,可以获取第二电子设备的扫描进度并发送给第一电子设备。第一电子设备可以在显示屏中显示第二电子设备的扫描进度。用户可以等待第二电子设备结束扫描,或者第一电子设备可以向第二电子设备发送结束扫描指令,第二电子设备接收到结束扫描指令后结束扫描。
例如,图13为本申请实施例提供的一种第一电子设备显示的扫描进度界面的示意图。参考图13,第一电子设备可以在图13所示的扫描进度界面中查看第二电子设备的扫描进度。例如,图13中包括第一电子设备(设备A)和两个第二电子设备(设备B和设备C)的扫描进度和状态,用户可以点击扫描进度界面中的结束扫描控件,强制结束设备B和设备C的扫描。
第二电子设备在结束扫描后,可以向分布式建图系统发送建图指令。分布式建图系统在接收到第一电子设备发送的建图指令和第二电子设备发送的建图指令后,可以根据第一电子设备上传的第一环境图像、第二电子设备上传的第二环境图像和每帧图像对应的定位信息进行三维地图的创建。具体的,在分布式建图系统的计算节点对第一电子设备和第二电子设备上传的所有图像进行上述步骤4中所述的图像处理过程后,可以按如下步骤B1~B4进行三维地图的创建:
B1:计算节点根据多帧第一环境图像和多帧第二环境图像生成场景匹配关系图(scence graph),其中,场景匹配关系图用于表征多帧图像之间的匹配关系。
计算节点可以根据多帧图像之间的匹配关系确定多帧图像的共视关系,再通过对共视关系进行优化后得到场景匹配关系图,其中,多帧图像包括多帧第一环境图像和多帧第二环境图像。
其中,场景匹配关系图可以视为一种由“顶点”和“边”组成的抽象网络,网络中每个顶点可以代表一帧图像,每个边代表图像间的一对特征点匹配对。不同“顶点”可以通过“边”实现连接,表示通过“边”连接的两个顶点具有关联关系,即两个“顶点”代表的两帧图像的匹配关系。
B2:计算节点根据场景匹配关系图确定多帧图像中各特征点在三维空间中对应的三维点。
计算节点生成场景匹配关系图后,可以根据场景匹配关系图、每帧第一环境图像的位姿信息、每帧第二环境图像的目标位姿信息以及相机内参确定多帧第一环境图像和多帧第二环境图像中的各特征点在三维空间中对应的三维点。其中,该三维空间的三维坐标系为第一电子设备对应的三维坐标系,与建图初始化过程中生成的初始三维地图的坐标系一致。
针对场景匹配关系图中同一特征点的不同视角,计算节点可以通过诸如直接线性变换(direct linear transformation,DLT)等算法结合环境图像的位姿信息、相机内参求解该特征点在三维空间中对应的位置(即三角化),并将该位置处的点确定为该特征点在三维空间中对应的三维点。计算节点确定场景匹配关系图中所有特征点在三维空间中对应的三维点后,可以得到这些三维点组成的三维地图,该三维地图为三维点云地图。
B3:计算节点对三维点在三维空间中的坐标进行优化。
计算节点可以对上述求解得到的三维点进行光束平差法(bundle adjustment,BA)优化,即通过根据相机模型将三维空间中的三维点反投影回图像的位置误差,优化环境图像的位姿信息、三维点位置以及电子设备的相机内参矩阵,从而得到精确的环境图像的位姿信息、相机内参以及三维点的坐标,进而得到优化后的三维地图。
B4:计算节点根据优化后的三维点生成三维地图。
计算节点可以结合相机位姿以及每一帧第一环境图像和每一帧第二环境图像对应的GPS信息、IMU信息进行平滑及去噪处理,得到每帧图像对应的真实世界的相机位姿,每帧图像对应的相机位姿可以为电子设备拍摄该帧图像时在真实环境中的位置及朝向。计算节点根据每帧图像对应的相机位姿,将三维空间中三维点的坐标与真实世界的坐标进行对齐,从而将三维空间的坐标系调整至与真实环境空间的坐标系一致,进而得到与真实环境等比例的三维地图,该三维地图为真实环境场景对应的点云地图。
例如,图14为本申请实施例提供的一种三维地图的示意图,如图14中所示的三维地图中的三维点分别对应电子设备扫描的真实环境中的三维点,每个三维点在三维空间中的位置用于表征该三维点对应的真实环境中的三维点在真实环境中的位置。
示例性的,基于图5中所示的分布式建图系统,第一电子设备触发建图时,可以发送建图指令及本次建图的图像扫描张数(即满足关键帧要求的第一环境图像的数量)等信息到云端调度中心。同样的,第二电子设备触发建图时,也可以发送建图指令及本次建图的图像扫描张数(即满足关键帧要求的第二环境图像的数量)等信息到云端调度中心,云端调度中心发送建图任务到队列形消息中间件,同时本次建图的基本信息及用户属性信息到云端数据库进行保存。各CPU算法组件可以监听队列形消息中间件中的建图任务并进行处理,最终生成三维地图文件并存储到弹性文件服务。在建图过程中,各CPU算法组件可以 将建图进度、建图成功或失败的信息、地图对齐矩阵(SLAM坐标系与真实世界坐标系之间的转换矩阵)等信息发送至队列形消息中间件,将建图结果(即创建的三维地图)保存到弹性文件服务中。云端调度中心可以监听队列形消息中间件中的建图进度的信息,从而获得当前建图任务的处理进度、状态、地图对齐矩阵等信息并将这些信息存储到云端数据库。
通过以上方法,多个电子设备可以同时采集环境图像,分布式建图系统可以将多个电子设备采集的环境图像转换至同一三维坐标系中,并根据转换后的环境图像生成三维地图,从而实现多设备建图,提升AR场景的三维地图的创建效率,同时增强建图过程中多用户之间的互动性,提升用户体验。
图15为本申请实施例提供的一种多设备构建三维地图方法的流程示意图。该方法可以由图2所示的增强现实系统中的第一电子设备、第二电子设备以及分布式建图系统执行。参考图15,该方法包括以下步骤:
S1501:第一电子设备扫描真实世界场景,采集多帧第一初始图像,以及,运行SLAM算法并提取每帧第一初始图像对应的定位信息。
其中,定位信息包括位姿信息、GPS信息、IMU信息。
S1502:第一电子设备向分布式建图系统发送多人建图请求。
其中,多人建图请求中包括多帧第一初始图像以及每帧第一初始图像的定位信息。
S1503:分布式建图系统中的计算节点根据多帧第一初始图像和每帧第一初始图像的定位信息构建初始三维地图。
S1504:第二电子设备扫描真实世界场景,采集多帧第二初始图像,以及,运行SLAM算法并提取每帧第二初始图像对应的定位信息。
S1505:第二电子设备向分布式建图系统发加入建图请求。
其中,加入建图请求中包括多帧第二初始图像和每帧第二初始图像的定位信息。
S1506:分布式建图系统中的计算节点对多帧第二初始图像进行图像处理,确定第二初始图像中包含任一帧第一初始图像中的图像内容。
S1507:分布式建图系统中的计算节点根据初始三维地图和多帧第二初始图像确定目标转换关系。
其中,目标转换关系为第二初始图像对应的三维坐标系与初始三维地图的三维坐标系之间的转换关系。
S1508:第一电子设备在移动过程中扫描得到真实世界场景的视频,并从中选择满足关键帧要求的第一环境图像;以及,运行SLAM算法并提取每帧第一环境图像对应的定位信息。
其中,定位信息包括位姿信息、GPS信息、IMU信息。第一电子设备的移动过程由第一用户控制实现。
S1509:第一电子设备将满足关键帧要求的多帧第一环境图像分别上传至分布式建图系统。
S1510:第二电子设备在移动过程中扫描得到真实世界场景的视频,并从中选择满足关键帧要求的第二环境图像;以及,运行SLAM算法并提取每帧第二环境图像对应的定位信息。
其中,定位信息包括位姿信息、GPS信息、IMU信息。第二电子设备的移动过程由第二用户控制实现。
S1511:第二电子设备将满足关键帧要求的多帧第二环境图像分别上传至分布式建图系统。
S1512:分布式建图系统中的计算节点根据目标转换关系对第二环境图像的位姿信息进行转换。
S1513:分布式建图系统中的计算节点根据多帧第一环境图像和位姿转换后的多帧第二环境图像生成目标三维地图。
S1514:分布式建图系统将目标三维地图发送给第一电子设备和第二电子设备。
S1515:第一电子设备获取多帧第一环境图像对应的第一深度图。
S1516:第一电子设备将第一深度图发送给分布式建图系统。
S1517:第二电子设备获取多帧第二环境图像对应的第二深度图。
S1518:第二电子设备将第二深度图发送给分布式建图系统。
S1519:分布式建图系统的计算节点根据目标转换矩阵对第二深度图对应的坐标系进行转换。
S1520:分布式建图系统的计算节点对第一深度图和转换坐标系后的第二深度图进行融合处理,得到当前环境对应的完整深度图。
S1521:分布式建图系统的计算节点根据完整深度图生成当前环境对应的白膜。
四、定位
本申请实施例中,分布式建图系统在完成三维地图的创建后,第一电子设备和第二电子设备可以在显示屏中显示AR场景,用户可以查看AR场景进行游玩。第一电子设备和第二电子设备在显示屏中显示AR场景之前,分布式建图系统可以根据三维地图对第一电子设备和第二电子设备进行定位。具体实施中,分布式建图系统对第一电子设备进行定位的方法和对第二电子设备进行定位的方法相同,为便于描述,下面以分布式建图系统对第一电子设备进行定位为例对本申请实施例提供的定位方法进行介绍:
当用户持第一电子设备处于该三维地图对应的环境中时,第一电子设备可以响应于用户选择该三维地图的操作,采集当前环境的至少一帧第一用户图像并上传至分布式建图系统,分布式建图系统根据至少一帧第一用户图像及先前创建的该环境对应的三维地图,采用GVPS方法确定第一电子设备在三维地图中的相对位置。分布式建图系统将确定出的相对位置发送给第一电子设备。此时,第一电子设备在三维地图中的位置已确定,用户可以在电子设备当前扫描到的环境图像中添加数字资源。第一电子设备可以基于第一相对位置在显示屏中显示AR场景,该AR场景中可以包括第一电子设备的摄像装置当前采集到的环境图像,以及用户在AR场景中添加的数字资源对应的虚拟物品。
示例性的,基于图5所示的分布式建图系统,用户触发定位后,第一电子设备向云端调度中心发送定位请求和当前扫描到的至少一帧第一用户图像,云端调度中心将接收到的定位请求和至少一帧第一用户图像发送至GVPS定位服务。GVPS定位服务读取弹性文件服务中存储的三维地图的地图数据,根据该地图数据和至少一帧第一用户图像确定第一电子设备当前位姿在三维地图中的相对位置,并将该相对位置的信息发送至云端调度中心。云端调度中心从云端数据库查询当前地图相关的兴趣点(point of interest)POI信息后,将 该POI信息及来自GVPS服务的相对位置发送给第一电子设备。第一电子设备可以根据接收到的POI信息从对象存储服务下载三维数字资源模型并进行渲染后添加到第一电子设备显示的AR场景中。
五、添加数字资源
分布式建图系统在对第一电子设备完成定位后,第一电子设备可以在显示屏中查看AR场景,并在AR场景中添加数字资源,此时用户可以在第一电子设备显示的AR场景中观察到摄像装置实时拍摄的环境图像以及用户添加的三维数字资源模型。
例如,电子设备显示图16a所示的AR场景界面中包含三维数字资源模型的素材,用户可以从中选择素材后添加到图16a所示的数字世界场景中。例如,在用户选择添加某一素材后电子设备可以显示图16b中所示的添加三维数字资源模型后的AR场景界面,该AR场景界面中包括真实环境场景的图像和用户添加的三维数字资源模型,可以实现真实世界场景与虚拟数字资源的融合显示。其中,在添加素材过程中,当用户选中某一素材并移动至某一区域时,第一电子设备可以在该区域显示对应的白模,以引导用户选择合适的区域放置素材。当用户确定将虚三维数字资源模型放置在某一区域内时,第一电子设备可以确定放置区域在三维地图中对应的位置,并将数字资源的标识以及位置发送给分布式存储系统。分布式存储系统存储数字资源的标识和位置,以便于用户在下次持电子设备查看AR场景中,仍然可以显示用户之前添加的数字资源对应的虚拟物品。
示例性的,基于图5中所示的分布式建图系统,定位完成后,第一电子设备可以向云端调度中心请求三维数字资源模型对应的数字资源列表。云端调度中心通过查询云端数据库获取当前用户所对应的数字资源列表,并发送给第一电子设备。在用户选择素材后,第一电子设备通过URL到对象存储服务下载三维数字资源模型,并添加进AR场景。数字资源添加完成后,用户可以通过点击第一电子设备显示的保存的控件,触发第一电子设备上传当前数字资源模型的大小及位姿等信息给云端调度中心,云端调度中心发送该信息到云端数据库进行保存。
可以理解的是,第二电子设备也可以为在显示屏中显示AR场景以供用户游玩,具体实施可以参见上述实施例,重复之处不再赘述。
本申请一些实施例中,多个用户可以操作不同的电子设备同时在同一个AR场景中进行游玩。以第一用户操作第一电子设备和第二用户操作第二电子设备为例,在分布式建图系统创建三维地图并将三维地图发送给第一电子设备和第二电子设备之后,第一电子设备和第二电子设备可以显示AR场景并进行交互游玩;或者分布式建图系统在创建初始三维地图后,也可以将初始三维地图发送给第一电子设备和第二电子设备,此时第二电子设备可以选择进行扫描环境图像或直接基于初始三维地图显示AR场景进行游玩。如第二电子设备可以直接显示根据初始三维地图构建的AR场景供第二用户进行游玩,同样第一电子设备也可以显示根据初始三维地图构建的AR场景供第二用户进行游玩。具体的,第一电子设备可以在显示屏中显示AR场景,第二电子设备也可以在显示屏中显示AR场景,此时若第一用户持第一电子设备进入第二电子设备的摄像装置可以拍摄到的区域,则第二电子设备上可以在第一用户的图像上显示与第一用户相关的三维数字资源模型,进而实现第二电子设备与第一电子设备之间交互游玩。
例如,图17为本申请实施例提供的一种电子设备显示的多用户在AR场景中交互游玩的示意图。图17为第一电子设备的显示中显示的内容,第一电子设备显示AR场景,该AR场景包括第一电子设备实时拍摄的环境图像和三维数字资源模型。其中,第二用户持第二电子设备进入第一电子设备的摄像装置的可拍摄区域中,第一电子设备的显示屏中显示拍摄第二用户得到的图像,并在第二用户上方显示第二用户对应的“生命值”三维数字资源模型;其中,第二电子设备可以对实时采集到的图像运行SLAM算法确定第二电子设备的位姿信息,第二电子设备将位姿信息发送给分布式建图系统,分布式建图系统可以根据AR场景的三维地图或初始三维地图确定第二电子设备的目标位姿信息,目标位姿信息为第二电子设备在三维地图的三维坐标系中的位姿信息。分布式建图系统将第二电子设备的目标位姿信息发送给第一电子设备,第一电子设备根据第二电子设备的目标位姿信息确定第二用户对应的“生命值”三维数字资源模型在AR场景中的位置,并根据该位置在显示屏中显示第二用户对应的“生命值”三维数字资源模型。同样,第三用户持第三电子设备进入第一电子设备的摄像装置的可拍摄区域中,第一电子设备的显示屏中显示拍摄第三用户得到的图像,并在第三用户上方显示第三用户对应的“生命值”三维数字资源模型,第一用户、第二用户和第三用户可以在各自的电子设备上操作进行交互游玩。
基于以上方式,第一电子设备和第二电子设备可以为用户提供使用、编辑创建的三维地图的功能,同时允许用户在创建的三维地图中添加三维数字资源模型,实现了真实环境场景与虚拟数字场景的融合应用。
本申请实施例提供的多设备建图方法应用于图10所示的多设备扫描环境场景之外,还可以应用于合并多个三维地图、扩展三维地图等场景。下面分别对本申请实施例提供的多设备建图方法应用于合并多个三维地图和扩展三维地图的两个场景时的实施方式进行介绍:
1、合并多个三维地图的场景。
用户可以在电子设备中查看已创建的三维地图。例如,图18为本申请实施例提供的一种电子设备显示的已创建的三维地图列表的示意图。用户可以点击勾选其中的多个三维地图进行合并。
电子设备在接收到用户触发的地图合并指令后,向分布式建图系统发地图合并请求,该地图合并请求中包括用户选择的多个三维地图的标识。分布式建图系统接收到地图合并请求后,根据地图合并请求中的多个三维地图的标识获取每个三维地图对应的多帧环境图像。以用户选择两个三维地图(第一地图和第二地图)为例进行说明,分布式建图系统根据第一地图对应的多帧第一环境图像的全局特征和第二图像对应的多帧第二环境图像的全局特征确定多帧第一环境图像和多帧第二环境图像中的匹配图像对,其中,两帧图像匹配可以为两帧图像的全局特征相似(如两帧图像的全局特征之间的相似度大于预设阈值)。分布式建图系统根据匹配图像对确定第二环境图像对应的目标位姿信息,该目标位姿信息为第二环境图像相对于第一环境图像对应的三维坐标系的位姿信息。分布式建图系统根据第二环境图像的初始位姿信息和目标位姿信息确定第二环境图像对应的目标转换关系,该目标转换关系为第二环境图像对应的三维坐标系和第一环境图像对应的三维坐标系之间的转换关系。在确定出第二环境图像对应的目标转换关系后,分布式建图系统可以根据目 标转换关系对多帧第二环境图像进行位姿转换。分布式建图系统可以根据多帧第一环境图像和位姿转换后的多帧第二环境图像生成目标三维地图,该目标三维地图可以看作对第一地图和第二地图进行合并后的三维地图。其中,分布式建图系统根据多帧第一环境图像和多帧第二环境图像生成目标三维地图的具体实施可以参见本申请实施例中数据采集和建图阶段的内容,重复之处不再赘述。
2、扩展三维地图的场景。
用户在电子设备显示的AR场景中游玩时,可以点击AR场景中的扩展控件,触发电子设备对当前AR场景对应的三维地图进行扩展。例如,图19为本申请实施例提供的一种用户触发扩展三维地图的示意图。参考图19,用户可以点击图19所示的显示界面中的扩展控件,以触发电子设备进行三维地图的扩展。
电子设备在接收到用户触发的扩展三维地图的指令后,向分布式建图系统发送扩展三维地图的请求消息,该请求消息中包括三维地图的标识以及第一用户图像。第一用户图像为电子设备在接收到用户触发的扩展三维地图的指令后对当前所处环境采集到的环境图像。分布式建图系统中的计算节点对第一用户图像进行图像处理,获取第一用户图像的全局特征和局部特征。计算节点根据三维地图的标识获取三维地图对应的多帧环境图像,并根据多帧环境图像的全局特征和第一用户图像的全局特征确定与第一用户图像匹配的环境图像。其中,与第一用户图像匹配的环境图像可以为全局特征与第一用户图像的全局特征相似的环境图像。计算节点根据第一用户图像和与第一用户图像匹配的环境图像确定第一用户图像相对于环境图像对应的三维坐标系的目标位姿信息,并根据第一用户图像的初始位姿信息和目标位姿信息确定第一用户图像对应的目标转换关系,该目标转换关系为第一环境图像对应的三维坐标系和环境图像对应的三维坐标系之间的转换关系。
电子设备可以显示引导用户扫描当前所处环境的信息,用户持电子设备对当前所处的环境进行扫描。电子设备将扫描拍摄得到的视频中满足关键帧要求的第一环境图像上传至分布式建图系统。分布式建图系统在接收到电子设备上传的第一环境图像后,根据目标转换关系对第一环境图像的位姿信息进行转换,得到第一环境图像的目标位姿信息,从而将电子设备上传的多帧第一环境图像注册到三维地图对应的多帧环境图像组成的图形序列中。分布式建图系统可以根据原三维地图对应的多帧环境图像和位姿转换后的多帧第一环境图像生成目标三维地图,该目标三维地图可以看作对用户选择的原三维地图进行扩展后的三维地图。其中,分布式建图系统根据原三维地图对应的多帧环境图像和位姿转换后的多帧第一环境图像生成目标三维地图的具体实施可以参见本申请实施例中数据采集和建图阶段的内容,重复之处不再赘述。
基于以上实施例,本申请还提供一种多设备创建三维地图的方法。该方法可以由图2所示的增强现实系统中的第一电子设备、第二电子设备和分布式建图系统执行,第一电子设备和第二电子设备可以具有本申请实施例中图3和/或图4所示的结构,分布式建图系统可以具有本申请实施例中图5所示的结构。图20为本申请实施例提供的一种多设备创建三维地图的方法流程图。参考图20,该方法包括以下步骤:
S2001:第一电子设备向分布式建图系统发送多帧第一环境图像和每帧第一环境图像的位姿信息。
其中,第一环境图像为第一电子设备对所处环境拍摄得到的,每帧第一环境图像的位姿信息用于表示第一电子设备在拍摄第一环境图像时在第一电子设备对应的三维坐标系中的位置和朝向。
S2002:第二电子设备向分布式建图系统发送多帧第二环境图像和每帧第二环境图像的初始位姿信息。
其中,第二环境图像为第二电子设备对所处环境拍摄得到的,每帧第二环境图像的初始位姿信息用于表示第二电子设备在拍摄第二环境图像时在第二电子设备对应的三维坐标系中的位置和朝向。
S2003:分布式建图系统根据目标转换关系对多帧第二环境图像的初始位姿信息进行位姿转换,得到每帧第二环境图像的目标位姿信息。
其中,目标位姿信息用于表示第二电子设备在拍摄第二环境图像时在第一电子设备对应的三维坐标系中的位置和朝向;目标转换关系为第二电子设备对应的三维坐标系与第一电子设备对应的三维坐标系之间的转换关系。
S2004:分布式建图系统根据多帧第一环境图像、每帧第一环境图像的位姿信息、多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图。
其中,三维地图可以用于构建增强现实场景。
需要说明的是,本申请图20所示的多设备创建三维地图方法的具体实时可以参见以上各实施例实施,重复之处不再赘述。
基于以上实施例,本申请还提供一种电子设备,所述电子设备包括多个功能模块;所述多个功能模块相互作用,实现本申请实施例所描述的各方法中第一电子设备或第二电子设备所执行的功能。如执行图15所示实施例中第一电子设备执行的S1501-S1502、S1508-S1509、S1515-S1516,或执行图15所示实施例中第一电子设备执行的S1504-S1505、S1510-S1511、S1517-S1518。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
基于以上实施例,本申请还提供一种电子设备,该电子设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储计算机程序指令,所述电子设备运行时,所述至少一个处理器执行本申请实施例所描述的各方法中第一电子设备或第二电子设备所执行的功能。如执行图15所示实施例中第一电子设备执行的S1501-S1502、S1508-S1509、S1515-S1516,或执行图15所示实施例中第一电子设备执行的S1504-S1505、S1510-S1511、S1517-S1518。
基于以上实施例,本申请还提供一种分布式建图系统,所述分布式建图系统包括多个计算节点,所述多个计算节点并行/串行处理数据。每个计算节点用于执行本申请实施例所描述的各方法中分布式建图系统所执行的功能。如执行图15所示实施例中的S1503、S1506-S1507、S1512-S1513、S1519-S1521。
进一步地,多个计算节点可以包括多个CPU算法组件和多个GPU算法组件,其中,GPU算法组件可以执行图像处理等过程,CPU算法组件可以执行数据处理等过程。例如,GPU算法组件可以执行图15所示实施例中的S1506;CPU算法组件可以执行图15所示实施例中的S1507、S1512等步骤。
基于以上实施例,本申请还提供一种计算机程序,当所述计算机程序在计算机上运行 时,使得所述计算机执行本申请实施例所描述的各方法。
基于以上实施例,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行本申请实施例所描述的各方法。
基于以上实施例,本申请还提供了一种芯片,所述芯片用于读取存储器中存储的计算机程序,实现本申请实施例所描述的各方法。
基于以上实施例,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持计算机装置实现本申请实施例所描述的各方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (25)

  1. 一种增强现实系统,其特征在于,所述增强现实系统包括第一电子设备、第二电子设备和分布式建图系统;
    所述第一电子设备,用于向所述分布式建图系统发送多帧第一环境图像和每帧第一环境图像的位姿信息;所述第一环境图像为所述第一电子设备对所处环境拍摄得到的,每帧第一环境图像的位姿信息用于表示所述第一电子设备在拍摄第一环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;
    所述第二电子设备,用于向所述分布式建图系统发送多帧第二环境图像和每帧第二环境图像的初始位姿信息,所述第二环境图像为所述第二电子设备对所处环境拍摄得到的,每帧第二环境图像的初始位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第二电子设备对应的三维坐标系中的位置和朝向;
    所述分布式建图系统,用于接收所述第一电子设备发送的所述多帧第一环境图像和每帧第一环境图像的位姿信息;接收所述第二电子设备发送的所述多帧第二环境图像和每帧第二环境图像的初始位姿信息;根据目标转换关系对所述多帧第二环境图像的初始位姿信息进行位姿转换,得到每帧第二环境图像的目标位姿信息,所述目标位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;所述目标转换关系为所述第二电子设备对应的三维坐标系与所述第一电子设备对应的三维坐标系之间的转换关系;根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,所述三维地图用于构建增强现实场景。
  2. 如权利要求1所述的增强现实系统,其特征在于,
    所述第一电子设备还用于:
    在向所述分布式建图系统发送所述多帧第一环境图像和每帧第一环境图像的位姿信息之前,向所述分布式建图系统发送多设备建图请求,所述多设备建图请求中包括多帧第一初始图像以及每帧第一初始图像的定位信息;所述多帧第一初始图像为所述第一电子设备对所处环境拍摄得到的;
    所述第二电子设备还用于:
    在向所述分布式建图系统发送所述多帧第二环境图像和每帧第二环境图像的初始位姿信息之前,向所述分布式建图系统发送加入建图请求,所述加入建图请求中包括多帧第二初始图像;所述多帧第二初始图像为所述第二电子设备对所处环境拍摄得到的;
    所述分布式建图系统还用于:
    接收所述第一电子设备发送的所述多设备建图请求,根据所述多设备建图请求中的多帧第一初始图像以及每帧第一初始图像的定位信息生成初始三维地图;接收所述第二电子设备发送的加入建图请求,根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系。
  3. 如权利要求2所述的增强现实系统,其特征在于,所述分布式建图系统还用于:
    在接收到所述多帧第二初始图像之后,根据所述多帧第二初始图像和所述初始三维地图确定目标转换关系之前,对所述多帧第二初始图像进行图像处理,确定至少一帧第二初始图像与任一帧第一初始图像包含相同的图像内容。
  4. 如权利要求3所述的增强现实系统,其特征在于,所述分布式建图系统具体用于:
    提取目标初始图像的全局特征和特征点,所述目标初始图像为任一帧第二初始图像;
    根据所述目标初始图像的全局特征确定与所述目标初始图像匹配的至少一帧第一初始图像,并确定与所述目标初始图像匹配的至少一帧第一初始图像中的特征点在三维地图中对应的三维点,将确定出的三维点作为所述目标初始图像的特征点对应的三维点;
    根据所述目标初始图像的特征点、所述目标初始图像的特征点对应的三维点以及第二电子设备的相机内参确定所述目标初始图像的目标位姿信息;
    根据所述目标初始图像的初始位姿信息和所述目标初始图像的目标位姿信息确定所述目标转换关系。
  5. 如权利要求2-4任一项所述的增强现实系统,其特征在于,
    所述分布式建图系统还用于:
    根据所述初始三维地图中的三维点生成点云资源;
    接收所述第一电子设备发送定位请求,所述定位请求中包括所述第一电子设备采集到的环境图像;根据所述定位请求中的环境图像和所述初始三维地图对所述第一电子设备进行定位,确定所述第一电子设备在所述初始三维地图的三维坐标系中的位姿;
    向所述第一电子设备发送所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源;
    所述第一电子设备还用于:
    向所述分布式建图系统发送定位请求;
    接收所述分布式建图系统发送的所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源;
    根据所述第一电子设备在所述初始三维地图的三维坐标系中的位姿显示所述第一电子设备实时采集的环境图像和所述点云资源,以表示所述点云资源覆盖的区域已完成扫描。
  6. 如权利要求1所述的增强现实系统,其特征在于,所述分布式建图系统具体用于:
    从所述多帧第一环境图像和所述多帧第二环境图像中选择待处理的一帧图像作为目标图像,并对所述目标图像进行目标处理过程,至所述多帧第一环境图像和所述多帧第二环境图像均已进行所述目标处理过程;
    所述目标处理过程包括以下步骤:提取所述目标图像的第一特征点;获取已进行所述目标处理过程的至少一帧图像的特征点;在所述至少一帧图像的特征点中选择至少一个第二特征点与所述第一特征点组成特征匹配对;其中,所述第一特征点和所述至少一个第二特征点对应所述环境中的同一点;所述已进行所述目标处理过程的至少一帧图像包括已进行所述目标处理过程的至少一帧第一环境图像和/或至少一帧第二环境图像;
    获取对所述多帧第一环境图像和所述多帧第二环境图像进行目标处理过程后得到的多个特征匹配对,并根据所述多个特征匹配对创建所述三维地图。
  7. 如权利要求6所述的增强现实系统,其特征在于,所述分布式建图系统具体用于:
    根据所述多帧第一环境图像的位姿信息和所述多帧第二环境图像的目标位姿信息,确定所述多个特征匹配对在所述第一电子设备对应的三维坐标系中对应的多个三维点,得到所述三维地图。
  8. 如权利要求7所述的增强现实系统,其特征在于,
    所述第一电子设备还用于:
    向所述分布式建图系统发送每帧第一环境图像对应的定位信息;
    所述第二电子设备还用于:
    向所述分布式建图系统发送每帧第二环境图像对应的定位信息;
    所述分布式建图系统还用于:
    接收所述第一电子设备发送的每帧第一环境图像对应的定位信息;接收所述第二电子设备发送的每帧第二环境图像对应的定位信息;根据每帧第一环境图像对应的定位信息和每帧第二环境图像对应的定位信息对所述三维点的坐标进行调整,得到与真实环境等比例的三维地图。
  9. 如权利要求1-8任一项所述的增强现实系统,其特征在于,
    所述第一电子设备还用于:
    采集所述多帧第一环境图像对应的第一深度图,将所述第一深度图发送给所述分布式建图系统;
    所述第二电子设备还用于:
    采集所述多帧第一环境图像对应的第二深度图,将所述第二深度图发送给所述分布式建图系统;
    所述分布式建图系统还用于:
    接收所述第一电子设备发送的所述第一深度图;接收所述第二电子设备发送的所述第二深度图;根据所述目标转换关系对所述第二深度图进行坐标转换,对所述第一深度图和坐标转换后的第二深度图进行融合处理,得到完整深度图;根据所述完整深度图生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
  10. 如权利要求1-8任一项所述的增强现实系统,其特征在于,所述分布式建图系统还用于:
    基于多视图立体匹配算法确定每帧第一环境图像的深度信息和每帧第二环境图像的深度信息,根据每帧第一环境图像的深度信息和每帧第二环境图像的深度信息生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
  11. 如权利要求1所述的增强现实系统,其特征在于,
    所述分布式建图系统还用于:
    将所述三维地图发送给所述第一电子设备和所述第二电子设备;
    接收所述第一电子设备发送的定位请求,确定所述第一电子设备在所述三维地图的三维坐标系中的第一位姿,将所述第一位姿发送给所述第一电子设备和所述第二电子设备;
    接收所述第二电子设备发送的定位请求,确定所述第二电子设备在所述三维地图的三维坐标系中的第二位姿,将所述第二位姿发送给所述第一电子设备和所述第二电子设备;
    所述第一电子设备还用于:
    接收所述分布式建图系统发送的所述三维地图;
    向所述分布式建图系统发送定位请求,接收所述分布式建图系统发送的所述第一位姿和所述第二位姿,根据所述第一位姿和所述三维地图显示增强现实场景,并根据所述第二位姿显示拍摄到的使用所述第二电子设备的用户的图像以及所述第二电子设备对应的三维数字资源模型。
  12. 一种多设备创建三维地图的方法,其特征在于,应用于分布式建图系统,所述方法包括:
    接收第一电子设备发送的多帧第一环境图像和每帧第一环境图像的位姿信息;所述第一环境图像为所述第一电子设备对所处环境拍摄得到的,每帧第一环境图像的位姿信息用于表示所述第一电子设备在拍摄第一环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;
    接收第二电子设备发送的多帧第二环境图像和每帧第二环境图像的初始位姿信息,所述第二环境图像为所述第二电子设备对所处环境拍摄得到的,每帧第二环境图像的初始位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第二电子设备对应的三维坐标系中的位置和朝向;
    根据目标转换关系对所述多帧第二环境图像的初始位姿信息进行位姿转换,得到每帧第二环境图像的目标位姿信息,所述目标位姿信息用于表示所述第二电子设备在拍摄第二环境图像时在所述第一电子设备对应的三维坐标系中的位置和朝向;所述目标转换关系为所述第二电子设备对应的三维坐标系与所述第一电子设备对应的三维坐标系之间的转换关系;
    根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,所述三维地图用于构建增强现实场景。
  13. 如权利要求12所述的方法,其特征在于,所述方法还包括:
    在接收所述第一电子设备发送的所述多帧第一环境图像和每帧第一环境图像的位姿信息之前,接收所述第一电子设备发送的多设备建图请求,所述多设备建图请求中包括多帧第一初始图像以及每帧第一初始图像的定位信息,所述多帧第一初始图像为所述第一电子设备对所处环境拍摄得到的;
    在接收所述第二电子设备发送的所述多帧第二环境图像和每帧第二环境图像的初始位姿信息之前,接收所述第二电子设备发送的加入建图请求,所述加入建图请求中包括多帧第二初始图像,所述多帧第二初始图像为所述第二电子设备对所处环境拍摄得到的;
    根据所述多设备建图请求中的多帧第一初始图像以及每帧第一初始图像的定位信息生成初始三维地图;
    根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系。
  14. 如权利要求12所述的方法,其特征在于,在所述接收到所述多帧第二初始图像之后,根据所述多帧第二初始图像和所述初始三维地图确定目标转换关系之前,所述方法还包括:
    对所述多帧第二初始图像进行图像处理,确定至少一帧第二初始图像与任一帧第一初始图像包含相同的图像内容。
  15. 如权利要求14所述的方法,其特征在于,所述根据所述多帧第二初始图像和所述初始三维地图确定所述目标转换关系,包括:
    提取目标初始图像的全局特征和特征点,所述目标初始图像为任一帧第二初始图像;
    根据所述目标初始图像的全局特征确定与所述目标初始图像匹配的至少一帧第一初始图像,并确定与所述目标初始图像匹配的至少一帧第一初始图像中的特征点在三维地图中对应的三维点,将确定出的三维点作为所述目标初始图像的特征点对应的三维点;
    根据所述目标初始图像的特征点、所述目标初始图像的特征点对应的三维点以及第二电子设备的相机内参确定所述目标初始图像的目标位姿信息;
    根据所述目标初始图像的初始位姿信息和所述目标初始图像的目标位姿信息确定所述目标转换关系。
  16. 如权利要求13-15任一项所述的方法,其特征在于,所述方法还包括:
    根据所述初始三维地图中的三维点生成点云资源;
    接收所述第一电子设备发送定位请求,所述定位请求中包括所述第一电子设备采集到的环境图像;根据所述定位请求中的环境图像和所述初始三维地图对所述第一电子设备进行定位,确定所述第一电子设备在所述初始三维地图的三维坐标系中的位姿;
    向所述第一电子设备发送所述第一电子设备在所述初始三维地图的三维坐标系中的位姿和所述点云资源,以使第一电子设备根据所述第一电子设备在所述初始三维地图的三维坐标系中的位姿显示所述第一电子设备实时采集的环境图像和所述点云资源,以表示所述点云资源覆盖的区域已完成扫描。
  17. 如权利要求12所述的方法,其特征在于,所述根据所述多帧第一环境图像、每帧第一环境图像的位姿信息、所述多帧第二环境图像以及每帧第二环境图像的目标位姿信息创建三维地图,包括:
    从所述多帧第一环境图像和所述多帧第二环境图像中选择待处理的一帧图像作为目标图像,并对所述目标图像进行目标处理过程,至所述多帧第一环境图像和所述多帧第二环境图像均已进行所述目标处理过程;
    所述目标处理过程包括以下步骤:提取所述目标图像的第一特征点;获取已进行所述目标处理过程的至少一帧图像的特征点;在所述至少一帧图像的特征点中选择至少一个第二特征点与所述第一特征点组成特征匹配对;其中,所述第一特征点和所述至少一个第二特征点对应所述环境中的同一点;所述已进行所述目标处理过程的至少一帧图像包括至少一帧第一环境图像和/或至少一帧第二环境图像;
    获取对所述多帧第一环境图像和所述多帧第二环境图像进行目标处理过程后得到的多个特征匹配对,并根据所述多个特征匹配对创建所述三维地图。
  18. 如权利要求17所述的方法,其特征在于,所述根据所述多个特征匹配对创建所述三维地图,包括:
    根据所述多帧第一环境图像的位姿信息和所述多帧第二环境图像的目标位姿信息,确定所述多个特征匹配对在所述第一电子设备对应的三维坐标系中对应的多个三维点,得到所述三维地图。
  19. 如权利要求18所述的方法,其特征在于,所述方法还包括:
    接收所述第一电子设备发送的每帧第一环境图像对应的定位信息;
    接收所述第二电子设备发送的每帧第二环境图像对应的定位信息;
    根据每帧第一环境图像对应的定位信息和每帧第二环境图像对应的定位信息对所述三维点的坐标进行调整,得到与真实环境等比例的三维地图。
  20. 如权利要求12-19任一项所述的方法,其特征在于,所述方法还包括:
    接收所述第一电子设备发送的所述多帧第一环境图像对应的第一深度图;
    接收所述第二电子设备发送的所述多帧第一环境图像对应的第二深度图;
    根据所述目标转换关系对所述第二深度图进行坐标转换,对所述第一深度图和坐标转换后的第二深度图进行融合处理,得到完整深度图;
    根据所述完整深度图生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环 境中的各物体的表面。
  21. 如权利要求12-19任一项所述的方法,其特征在于,所述方法还包括:
    基于多视图立体匹配算法确定每帧第一环境图像的深度信息和每帧第二环境图像的深度信息,根据每帧第一环境图像的深度信息和每帧第二环境图像的深度信息生成真实环境对应的白膜;其中,所述白膜用于表示所述真实环境中的各物体的表面。
  22. 如权利要求12所述的方法,其特征在于,所述方法还包括:
    将所述三维地图发送给所述第一电子设备和所述第二电子设备;
    接收所述第一电子设备发送的定位请求,确定所述第一电子设备在所述三维地图的三维坐标系中的第一位姿,将所述第一位姿发送给所述第一电子设备和所述第二电子设备;
    接收所述第二电子设备发送的定位请求,确定所述第二电子设备在所述三维地图的三维坐标系中的第二位姿,将所述第二位姿发送给所述第一电子设备和所述第二电子设备,以使第一电子设备根据所述第一位姿和所述三维地图显示增强现实场景,并根据所述第二位姿显示拍摄到的使用所述第二电子设备的用户的图像以及所述第二电子设备对应的三维数字资源模型。
  23. 一种分布式建图系统,其特征在于,所述分布式建图系统包括多个计算节点,所述多个计算节点并行/串行处理数据,每个计算节点用于执行如权利要求12-22任一项所述的方法。
  24. 一种电子设备,其特征在于,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个处理器用于读取所述至少一个存储器所存储的计算机程序,以执行如权利要求1-11中任一所述的增强现实系统中第一电子设备或第二电子设备的功能。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1-11中任一所述的增强现实系统中第一电子设备或第二电子设备的功能,或执行如权利要求12-22中任一所述的分布式建图系统所执行的方法。
PCT/CN2022/144274 2022-01-06 2022-12-30 一种增强现实系统、多设备构建三维地图的方法及设备 WO2023131090A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210010556.3 2022-01-06
CN202210010556.3A CN116452778A (zh) 2022-01-06 2022-01-06 一种增强现实系统、多设备构建三维地图的方法及设备

Publications (1)

Publication Number Publication Date
WO2023131090A1 true WO2023131090A1 (zh) 2023-07-13

Family

ID=87073175

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/144274 WO2023131090A1 (zh) 2022-01-06 2022-12-30 一种增强现实系统、多设备构建三维地图的方法及设备

Country Status (2)

Country Link
CN (1) CN116452778A (zh)
WO (1) WO2023131090A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738552A (zh) * 2023-08-11 2023-09-12 和欣汇达(山东)科技有限公司 一种基于物联网的环境检测设备管理方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117369633A (zh) * 2023-10-07 2024-01-09 上海铱奇科技有限公司 一种基于ar的信息交互方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665506A (zh) * 2016-07-29 2018-02-06 成都理想境界科技有限公司 实现增强现实的方法及系统
US20180211399A1 (en) * 2017-01-26 2018-07-26 Samsung Electronics Co., Ltd. Modeling method and apparatus using three-dimensional (3d) point cloud
CN110163963A (zh) * 2019-04-12 2019-08-23 南京华捷艾米软件科技有限公司 一种基于slam的建图装置和建图方法
CN111174799A (zh) * 2019-12-24 2020-05-19 Oppo广东移动通信有限公司 地图构建方法及装置、计算机可读介质、终端设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665506A (zh) * 2016-07-29 2018-02-06 成都理想境界科技有限公司 实现增强现实的方法及系统
US20180211399A1 (en) * 2017-01-26 2018-07-26 Samsung Electronics Co., Ltd. Modeling method and apparatus using three-dimensional (3d) point cloud
CN110163963A (zh) * 2019-04-12 2019-08-23 南京华捷艾米软件科技有限公司 一种基于slam的建图装置和建图方法
CN111174799A (zh) * 2019-12-24 2020-05-19 Oppo广东移动通信有限公司 地图构建方法及装置、计算机可读介质、终端设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738552A (zh) * 2023-08-11 2023-09-12 和欣汇达(山东)科技有限公司 一种基于物联网的环境检测设备管理方法及系统
CN116738552B (zh) * 2023-08-11 2023-10-27 和欣汇达(山东)科技有限公司 一种基于物联网的环境检测设备管理方法及系统

Also Published As

Publication number Publication date
CN116452778A (zh) 2023-07-18

Similar Documents

Publication Publication Date Title
CN111324327B (zh) 投屏方法及终端设备
CN116320782B (zh) 一种控制方法、电子设备、计算机可读存储介质、芯片
WO2023131090A1 (zh) 一种增强现实系统、多设备构建三维地图的方法及设备
US10055890B2 (en) Augmented reality for wireless mobile devices
WO2013023705A1 (en) Methods and systems for enabling creation of augmented reality content
CN114168235B (zh) 一种功能切换入口的确定方法与电子设备
EP4060603A1 (en) Image processing method and related apparatus
WO2023284715A1 (zh) 一种物体重建方法以及相关设备
CN114845035B (zh) 一种分布式拍摄方法,电子设备及介质
WO2022161386A1 (zh) 一种位姿确定方法以及相关设备
WO2023124948A1 (zh) 一种三维地图的创建方法及电子设备
WO2021088497A1 (zh) 虚拟物体显示方法、全局地图更新方法以及设备
WO2023231697A1 (zh) 一种拍摄方法及相关设备
WO2023051383A1 (zh) 一种设备定位方法、设备及系统
JP2016194784A (ja) 画像管理システム、通信端末、通信システム、画像管理方法、及びプログラム
CN114970589A (zh) 一种扫码方法及终端
JP2016194783A (ja) 画像管理システム、通信端末、通信システム、画像管理方法、及びプログラム
WO2023131089A1 (zh) 一种增强现实系统、增强现实场景定位方法及设备
CN115147492A (zh) 一种图像处理方法以及相关设备
WO2024046162A1 (zh) 一种图片推荐方法及电子设备
WO2022267781A1 (zh) 建模方法及相关电子设备及存储介质
CN117635811A (zh) 一种模型驱动方法、系统及设备
WO2023125832A1 (zh) 图片分享方法和电子设备
CN117152338A (zh) 一种建模方法与电子设备
CN117671203A (zh) 一种虚拟数字内容显示系统、方法与电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918531

Country of ref document: EP

Kind code of ref document: A1