WO2023164845A1 - 三维重建方法、装置、系统及存储介质 - Google Patents

三维重建方法、装置、系统及存储介质 Download PDF

Info

Publication number
WO2023164845A1
WO2023164845A1 PCT/CN2022/078878 CN2022078878W WO2023164845A1 WO 2023164845 A1 WO2023164845 A1 WO 2023164845A1 CN 2022078878 W CN2022078878 W CN 2022078878W WO 2023164845 A1 WO2023164845 A1 WO 2023164845A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
image
features
model
neural network
Prior art date
Application number
PCT/CN2022/078878
Other languages
English (en)
French (fr)
Inventor
尹晓川
李鑫超
李思晋
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2022/078878 priority Critical patent/WO2023164845A1/zh
Publication of WO2023164845A1 publication Critical patent/WO2023164845A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the present application relates to the technical field of computer vision, in particular, to a three-dimensional reconstruction method, device, system and storage medium.
  • Three-dimensional reconstruction refers to the establishment of a mathematical model suitable for computer representation and processing of three-dimensional objects. It is the basis for processing, operating and analyzing its properties in a computer environment, and it is also a key technology for establishing a virtual reality that expresses the objective world in a computer.
  • Related 3D reconstruction methods include: collecting multiple images from the environment, and then processing multiple images by using structure from motion (SfM) or simultaneous localization and mapping (SLAM) to achieve Model the environment.
  • SfM structure from motion
  • SLAM simultaneous localization and mapping
  • one of the objectives of the present application is to provide a three-dimensional reconstruction method, device, system and storage medium.
  • the embodiment of the present application provides a three-dimensional reconstruction method, including:
  • the image of the target scene is collected by a camera arranged in the movable platform; and, the first image of the target scene is collected by a point cloud collection device arranged in the movable platform. a little cloud;
  • the embodiment of the present application provides a three-dimensional reconstruction device, including:
  • processors one or more processors
  • the one or more processors individually or jointly execute the executable instructions to perform the method described in the first aspect.
  • the embodiment of the present application provides a three-dimensional reconstruction system, including a movable platform and the three-dimensional reconstruction device described in the second aspect;
  • the movable platform is equipped with an imaging device and a point cloud acquisition device;
  • the movable platform is used to use the imaging device to collect images and the point cloud collection device to collect the first point cloud after moving to the target scene, and transmit the image and the first point cloud to the The three-dimensional reconstruction device described above.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method described in the first aspect is implemented.
  • the image of the target scene can be collected through the camera installed in the movable platform, and the The point cloud acquisition device set in the movable platform collects the first point cloud of the target scene; and then integrates the two types of data, the image and the first point cloud, to generate a three-dimensional model of the target scene in real time.
  • the image and the first point cloud complement each other, which is conducive to improving the accuracy of 3D reconstruction, and can generate a 3D model of the target scene in real time after the image and the first point cloud are collected, which can meet the requirements of certain scenarios. real-time requirements.
  • FIG. 1 is a schematic diagram of a three-dimensional reconstruction system provided by an embodiment of the present application
  • FIG. 2 is a schematic flow chart of a three-dimensional reconstruction method provided by an embodiment of the present application
  • Fig. 3 is a schematic structural diagram of a second neural network model provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a first neural network model provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a third neural network model provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a single-view three-dimensional model generated by using a single-frame image and a single-frame first point cloud provided by an embodiment of the present application;
  • Fig. 7 is a schematic structural diagram of a three-dimensional reconstruction device provided by an embodiment of the present application.
  • the embodiment of the present application provides a three-dimensional reconstruction method.
  • the image of the target scene can be collected through the camera installed in the movable platform, and by setting The point cloud acquisition device in the movable platform collects the first point cloud of the target scene; and then integrates the image and the first point cloud data to generate a three-dimensional model of the target scene in real time.
  • This embodiment realizes the complementarity of the image and the first point cloud data, which is conducive to improving the accuracy of 3D reconstruction, and can generate a 3D model of the target scene in real time after the image and the first point cloud are collected, which can meet certain scenarios. Under the real-time requirements.
  • the 3D reconstruction method can be performed by a 3D reconstruction device, and the 3D reconstruction device can be an electronic device with data processing capabilities, such as a computer, server, cloud server or terminal, a mobile platform, etc.; or,
  • the three-dimensional reconstruction device can also be a computer chip or an integrated circuit with data processing capabilities, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or off-the-shelf Programmable Gate Array (Field-Programmable Gate Array, FPGA), etc.; or, the three-dimensional reconstruction device can also be a program product integrated in an electronic device.
  • CPU Central Processing Unit
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the three-dimensional map processing device when the three-dimensional map processing device is a computer chip or an integrated circuit with data processing capability, the three-dimensional map processing device may be installed in a remote control device.
  • the remote control device communicates with the movable platform, and is used to control the movable platform, such as controlling the movement of the movable platform, or controlling the imaging device in the movable platform to take pictures, etc.; and the images collected by the movable platform or other data is sent to the remote control device for display in or further processing by the remote control device.
  • the mobile platform includes, but is not limited to, unmanned aerial vehicles, vehicles, unmanned ships, or mobile robots (such as sweeping robots) and the like.
  • the 3D reconstruction device is taken as an example of a remote control device with a movable platform.
  • the movable platform is an unmanned aerial vehicle for illustration.
  • the movable platform 11 is connected in communication with the remote control device 12, wherein the movable platform 11 is equipped with an imaging device and a point cloud collection device.
  • the movable platform 11 can be placed in the target scene, or the movable platform 11 can be controlled by the remote control device 12 to move to the target scene.
  • the movable platform 11 located in the target scene can use its carried imaging device to collect images and utilize its carried point cloud collection device to collect the first point cloud, and then transmit the collected image and the first point cloud to the remote control device 12,
  • the remote control device 12 may execute the 3D reconstruction method provided in the embodiment of the present application, and use the image and the first point cloud to perform 3D reconstruction, so as to obtain a 3D model of the target scene.
  • the field of view of the imaging device and the detection range of the point cloud collection device partially or completely overlap.
  • the imaging device may be, for example, a camera or a video camera or other equipment for capturing images, and the imaging device may perform shooting under the control of a remote control device or a movable platform.
  • the imaging device at least includes a photosensitive element, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor.
  • CMOS complementary Metal Oxide Semiconductor
  • CCD charge-coupled Device
  • the imaging device includes, but is not limited to, a visible light camera, a grayscale camera, and an infrared camera, and the like, and the image may be a color image, a grayscale image, or an infrared image, and the like.
  • the point cloud acquisition device includes but is not limited to lidar, millimeter wave radar, binocular vision sensor or structured light depth camera, etc.
  • LiDAR is used to transmit a laser pulse sequence to the target scene, then receive the laser pulse sequence reflected from the target, and generate a 3D point cloud based on the reflected laser pulse sequence.
  • the lidar can determine the receiving time of the reflected laser pulse sequence, for example, by detecting the rising edge time and/or falling edge time of the electrical signal pulse to determine the receiving time of the laser pulse sequence.
  • the laser radar can calculate TOF (Time of flight, time of flight) by using the receiving time information and emission time of the laser pulse sequence, thereby determining the distance from the detection object to the laser radar.
  • the lidar is an autonomous light-emitting sensor that does not depend on light source illumination and is less disturbed by ambient light. It can work normally even in a closed environment without light, so as to generate high-precision 3D models later, and has wide applicability.
  • the point cloud acquisition principle of millimeter-wave radar is similar to that of lidar, so it will not be repeated here.
  • the binocular vision sensor obtains two images of the target scene from different positions based on the principle of parallax, and obtains three-dimensional geometric information by calculating the position deviation between the corresponding points of the two images, thereby generating a three-dimensional point cloud.
  • the binocular vision sensor has low hardware requirements, and correspondingly, it can also reduce costs. It only needs to be an ordinary CMOS (Complementary Metal Oxide Semiconductor) camera. As long as the light is suitable, both indoor and outdoor environments can be used. Therefore, it also has certain applicability.
  • CMOS Complementary Metal Oxide Semiconductor
  • the structured light depth camera is to project light with certain structural characteristics into the target scene and then collect it. This kind of light with certain structure will collect different image phase information due to different depth regions of the subject, and then convert it into depth information to obtain a 3D point cloud.
  • the structured light depth camera is also an autonomous light-emitting sensor, which does not depend on light source illumination and is less disturbed by ambient light. It can work normally even in a closed environment without light, so as to generate high-precision 3D models in the future, and has wide applicability. .
  • the three-dimensional model of the target scene generated by the embodiment of the present application can be applied in various fields, and specific settings can be made according to the selected target scene.
  • the three-dimensional model can be applied to fields such as virtual reality (Virtual Reality), augmented reality (Augmented Reality), automatic driving, high-precision maps, surveying and mapping, geological surveying, architectural design or landscape representation.
  • FIG. 2 is a schematic flowchart of a 3D reconstruction method provided by the embodiment of the present application, and the method can be executed by a 3D reconstruction device.
  • the three-dimensional reconstruction device is communicatively connected with the movable platform.
  • the three-dimensional reconstruction device is a remote control device of the movable platform.
  • the methods include:
  • step S101 after the movable platform moves to the target scene, the image of the target scene is collected by a camera arranged in the movable platform; and, collected by a point cloud acquisition device arranged in the movable platform Describe the first point cloud of the target scene.
  • step S102 a 3D model of the target scene is generated in real time according to the image and the first point cloud.
  • the target scene may be an indoor scene or an outdoor scene; for example, the 3D model generated by using images collected from the indoor scene and the first point cloud may be provided to the user for interaction such as virtual reality and augmented reality; for example The image and the first point cloud of a large-scale outdoor scene can be obtained, and then based on the 3D reconstruction method provided by the embodiment of the present application, a 3D model with high precision can be generated to solve the problem of 3D reconstruction of an outdoor large-scale environment.
  • the target scene is a vehicle driving scene
  • the three-dimensional reconstruction device is installed on the vehicle
  • the image is collected by the imaging device carried by the vehicle
  • the first point cloud is obtained by the imaging device carried by the vehicle.
  • the 3D reconstruction device in the vehicle uses the image collected from the vehicle driving scene and the first point cloud to generate a 3D model, and the 3D model can be used to assist the decision-making of automatic driving of the vehicle.
  • the movable platform located in the target scene utilizes its on-board imaging device to capture the image and its on-board point cloud acquisition device to capture the first point cloud
  • the movable platform can combine the captured image and The first point cloud is transmitted to the three-dimensional reconstruction device, and the three-dimensional reconstruction device performs three-dimensional reconstruction of the target scene according to the image and the first point cloud.
  • the three-dimensional reconstruction device is a remote control device of the movable platform
  • the movable platform is equipped with a camera and a laser radar
  • the movable platform can be controlled by the remote control device to move in the scene to be modeled, when the control After the movable platform moves to the target scene, the camera collects the RGB image of the target scene, and the laser radar collects the first point cloud of the target scene.
  • the first point cloud includes several three-dimensional points, and these three-dimensional points can be used to represent the outer surface shape of an object.
  • the 3D point may also include information such as a depth value and a segmentation result of the 3D point.
  • the image can provide the color, texture and other characteristics of the object.
  • the distance between the 3D points is relatively large, that is, the first point cloud is a sparse point cloud.
  • the 3D reconstruction device can generate a dense second point cloud according to the image and the sparse first point cloud, that is, the number of 3D points in the second point cloud is large,
  • the distance between three-dimensional points is also relatively small; in other words, the density of the second point cloud is higher than the density of the first point cloud; and then the three-dimensional reconstruction device outputs the dense second point cloud according to the dense second point cloud in real time. 3D model of the target scene.
  • the 3D reconstruction device is based on the image and the sparse first After the dense second point cloud is generated from the point cloud, the second point cloud can be further optimized, and three-dimensional reconstruction is performed according to the first point cloud, the second point cloud and the image, and the target scene is generated in real time 3D model of .
  • the image and the first point cloud are real data collected from the target scene, the image and the first point cloud are introduced in the reconstruction process to provide error compensation for the second point cloud, so it is beneficial to improve the generated three-dimensional The accuracy of the model.
  • the first point cloud in the process of acquiring the second point cloud, firstly, the first point cloud can be mapped to the two-dimensional space where the image is located to obtain the first depth map, since the first point cloud is If the point cloud is sparse, it is mapped to obtain that some pixels in the first depth map may have a depth value (ie, the pixel value is not 0), while some pixels have no depth value (ie, the pixel value is 0).
  • the relative pose between the imaging device and the point cloud acquisition device can be determined in advance, and the conversion relationship between the image space and the point cloud space is determined according to the relative pose and the internal parameters of the imaging device, Furthermore, the transformation relationship can be used to map the first point cloud to the two-dimensional space where the image resides, to obtain a sparse first depth map.
  • the 3D reconstruction device can obtain a second depth map according to the first depth map and the image, for example, fuse the first depth map and the image to obtain the pixel corresponding to the pixel in the image Second depth map. Further, the second depth map is mapped to the three-dimensional space where the first point cloud is located, to obtain the second point cloud.
  • This embodiment implements the use of image features to perform depth complementation on the sparse first depth map to obtain a dense second depth map, the number of pixels in the second depth map with non-zero pixel values (that is, corresponding to depth values) More than the number of pixels with non-zero pixel values (that is, corresponding to depth values) in the first depth map, a dense second point cloud can be obtained through mapping from two-dimensional space to three-dimensional space.
  • the 3D reconstruction device may extract depth features from the first depth map, and extract image features from the image; then perform fusion processing on the depth features and the image features, such as
  • the second depth map may be obtained by fusing depth features and image features in the two-dimensional space where the image is located.
  • the image features include but are not limited to texture features, color features, shape features or edge features and the like.
  • image features of the image are used to perform depth complementation on the sparse first depth map, which is beneficial to improve the accuracy of the subsequently generated three-dimensional model.
  • the pre-trained second neural network model 200 may be used to automatically complete the first depth map by means of deep learning.
  • the second neural network model 200 includes a depth image extraction network 10 , an image extraction network 20 and a second fusion network 30 .
  • the first depth map and the image are input into the second neural network model 200, and the depth map extraction network 10 can be used to extract features from the first depth map to obtain depth features; and the image extraction network 20 can be used to perform feature extraction on the image to obtain image features; and then the second fusion network 30 can fuse the depth features and image features, for example, the depth features and image features can be combined along the channel dimension in the two-dimensional space where the image is located The features are concatenated to obtain fusion features, and the fusion features are further processed to obtain the second depth map.
  • the depth map extraction network includes at least one or more convolutional layers to extract features from the first depth map; the image extraction network includes at least one or more convolutional layers to implement Image feature extraction.
  • the second neural network model can be trained based on supervised learning, for example, the following training data can be obtained: image samples, first depth image samples mapped from the first point cloud samples, and second depth image labels.
  • the image sample and the first depth map sample can be input into the second neural network model, and the second neural network model uses the image sample to perform depth complementation on the first depth map sample to obtain the predicted second depth map ; and then the parameters of the second neural network model can be adjusted according to the difference between the predicted second depth map and the second depth map label; for example, the difference between the predicted second depth map and the second depth map label can be used Calculate the loss function of the second neural network model according to the difference, adjust the parameters of the second neural network model according to the calculated loss value, and obtain the trained second neural network model.
  • the embodiment of the present application does not impose any limitation on the specific type of the loss function, which can be specifically set according to the actual application scenario.
  • the loss function of the second neural network model includes a mean square error function
  • the 3D reconstruction device may map the second depth map to the 3D space where the first point cloud is located, and obtain the second point cloud .
  • the relative pose between the imaging device and the lidar can be determined in advance, and the conversion relationship between the image space and the point cloud space can be determined according to the relative pose and the internal parameters of the imaging device, and then the conversion relationship can be used Mapping the second depth map to the three-dimensional space where the first point cloud is located to obtain a dense second point cloud.
  • the second point cloud is generated according to the image and the first point cloud, considering that the second point cloud is obtained by processing the image and the first point cloud, there may be a certain error compared with the real target scene , that is, the obtained second point cloud is a rough and dense point cloud, which needs to be further refined to improve the accuracy of the 3D model.
  • the 3D reconstruction device can introduce the image and the first point cloud to optimize the second point cloud, according to the first point cloud, the first point cloud
  • the two point clouds and the image are subjected to three-dimensional reconstruction to generate a three-dimensional model of the target scene in real time, and the image and the first point cloud provide error compensation for the second point cloud, which is beneficial to improving the accuracy of the three-dimensional model.
  • the 3D reconstruction device may fuse the first point cloud, the second point cloud, and the image to generate a 3D model of the target scene in real time.
  • point cloud features can be extracted from the first point cloud and the second point cloud, and image features can be extracted from the image; wherein the image features include at least one of the following: texture features, color features , shape features or edge features; the point cloud features include at least one of the following: envelope information, distance information or positional relationship information between three-dimensional points; then perform three-dimensional reconstruction according to the point cloud features and the image features, For example, the point cloud features and the image features are fused, and the fused features are used for three-dimensional reconstruction.
  • effective features are extracted from the first point cloud, the second point cloud, and the image for reconstruction processing without fusing all the data.
  • it also reduces The amount of data in the reconstruction process is beneficial to improve the efficiency of the reconstruction process.
  • the point cloud features and the image features may be fused in the three-dimensional space where the second point cloud is located.
  • the fused features are also 3D features, and the fused features are used for 3D reconstruction without any other dimension conversion process, which is beneficial to improve the efficiency of 3D reconstruction.
  • the pre-trained first neural network model 100 can be used to perform 3D reconstruction by means of deep learning.
  • the first neural network model 100 includes a point cloud extraction network 40, an image extraction network 20 and a first fusion network 50; the first point cloud, the second point cloud and the image input
  • the point cloud extraction network 40 can be used to perform feature extraction on the first point cloud and the second point cloud to obtain point cloud features, wherein the point cloud features include but are not limited to envelopes information, distance information or positional relationship information between three-dimensional points, etc.
  • feature extraction can be performed on the image by the image extraction network 20 to obtain image features, wherein the image features include but not limited to texture features, color features, shape features or edge features, etc.
  • the first fusion network 50 can fuse the point cloud features and the image features in the 3D space where the second point cloud is located, and perform 3D reconstruction according to the fused features.
  • the point cloud extraction network includes at least one or more convolution layers to realize feature extraction for the first point cloud and the second point cloud;
  • the image extraction network includes at least one or more convolution layers layer to extract features from the image.
  • the first neural network model can be trained based on supervised learning, for example, the following training data can be obtained: image samples, first point cloud samples, second points obtained from the image samples and the point cloud samples Cloud samples and 3D model tags.
  • the image sample, the first point cloud sample and the second point cloud sample can be input into the first neural network model, and the first neural network model uses the input data to perform three-dimensional reconstruction to obtain a predicted three-dimensional model; and The parameters of the first neural network model may be adjusted according to the difference between the predicted 3D model and the 3D model label.
  • the loss function of the first neural network model can be set to include a first loss function and a second loss function; wherein, the first loss function is used to describe The difference between the 3D model predicted in the sample and the second sample and the 3D model label; the second loss function is used to describe the difference between the image sample, the first point cloud sample and the second sample The distance difference between the 3D model predicted in and the 3D model label; and then the parameters of the first neural network model can be adjusted according to the loss value of the first loss function and the loss value of the second loss function to obtain the trained first A neural network model.
  • the embodiment of the present application does not impose any limitation on the specific type of the loss function, which can be set according to the actual application scenario.
  • the first loss function includes a mean square error function
  • the second loss function Includes chamfer distance function and/or EMD distance function.
  • the first neural network model can be jointly trained based on multi-task learning and the second neural network model.
  • the parameters of the first neural network model and the second neural network model are jointly adjusted by using the loss functions of the first neural network model and the second neural network model.
  • the function of the image extraction network is the same, and in the actual application process, for the same three-dimensional model, the first neural network
  • the images processed by the image extraction network of the network model and the image extraction network of the second neural network model are the same, so in order to improve data processing efficiency and simplify the network structure, the first neural network model and the second neural network model can share a common
  • the image extraction network is used to extract image features, that is to say, the image features extracted by the image extraction network in the process of generating the second point cloud can be used in the 3D reconstruction process of the 3D model to realize the reuse of image features without repeating
  • the step of extracting image features is beneficial to improve the efficiency of three-dimensional reconstruction.
  • the new neural network model of model 200 (hereinafter referred to as the third neural network model 300).
  • the third neural network model 300 includes a first conversion layer 60 , a second conversion layer 70 , a depth map extraction network 10 , an image extraction network 20 , a point cloud extraction network 40 , a first fusion network 50 and a second fusion network 30 .
  • the first conversion layer 60 is used to map the first point cloud to the two-dimensional space where the image is located to obtain a first depth map; the first depth map extraction network 10 is used to extract from the first depth map Depth features; the image extraction network 20 is used to extract image features from images; the second fusion network 30 is used to fuse the depth features and the image features to obtain a second depth map; the first The second conversion layer 70 is used to map the second depth map to the three-dimensional space where the first point cloud is located, and obtain the second point cloud; the point cloud extraction network 40 is used to extract the point cloud from the first point cloud and the second point Extracting point cloud features from the cloud; the first fusion network 50 is used to perform 3D reconstruction processing on the features obtained by fusing the image features and the point cloud features to obtain a 3D model.
  • the single-frame image and the first single-frame point cloud collected from the target scene can be obtained, and the single-view image of the target scene can be generated in real time by using the single-frame image and the first single-frame point cloud.
  • 3D model Exemplarily, a single-frame second point cloud is generated in real time using a single-frame image and a single-frame first point cloud, the density of the second point cloud is higher than that of the first point cloud, based on the single-frame second point cloud A single-view three-dimensional model of the target scene is output in real time.
  • the second point cloud of the single frame can be further optimized by using the image of the single frame and the first point cloud of the single frame.
  • the second point cloud of the frame and the single frame image are subjected to 3D reconstruction to generate a single-view 3D model.
  • the single-view three-dimensional structure of the target scene is restored based on the single-frame image and the single-frame first point cloud.
  • multiple frames of images and multiple frames of first point clouds collected from the target scene may be obtained, and a multi-view 3D model of the target scene may be generated in real time by using the multiple frames of images and multiple frames of first point clouds.
  • a multi-view 3D model of the target scene may be generated in real time by using the multiple frames of images and multiple frames of first point clouds.
  • the density of the second point cloud is higher than the density of the first point cloud, based on the multiple frames of the second point cloud
  • a multi-view 3D model of the target scene is output in real time.
  • the second point cloud of multiple frames can be further optimized by using the images of multiple frames and the first point cloud of multiple frames.
  • the multiple 3D reconstruction of the second point cloud and multiple frames of images to generate a multi-view 3D model.
  • the multi-view three-dimensional structure of the target scene is restored based on the multi-frame images and the multi-frame first point cloud.
  • At least one of pseudo-color transformation and texture transformation may be performed on the three-dimensional model according to the depth of three-dimensional points in the three-dimensional model , and display the transformed three-dimensional model; wherein, different depths of the three-dimensional points correspond to different colors and/or textures.
  • the depth of the three-dimensional point can be set to have a negative correlation with the grayscale value, that is, the smaller the depth of the three-dimensional point, the larger the corresponding grayscale value, that is, the closer to white , conversely, the greater the depth of the three-dimensional point, the smaller the corresponding gray value, that is, the closer to black.
  • the texture can be determined according to the RGB information in the image, for example, the RGB information in the image can be mapped to the surface of the three-dimensional model to form texture information; or the texture can also be a preset Set texture information.
  • the embodiment of the present application also provides a three-dimensional reconstruction device 400, including:
  • processors 41 one or more processors 41;
  • memory 42 for storing said processor-executable instructions
  • the one or more processors 41 individually or jointly execute the executable instructions, so as to execute the method described in any one of the above.
  • the processor 41 executes the executable instructions included in the memory 42, and the processor 41 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 42 stores executable instructions of the three-dimensional reconstruction method, and the memory 42 may include at least one type of storage medium, and the storage medium includes a flash memory, a hard disk, a multimedia card, a card memory (for example, SD or DX memory, etc.) , Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Disk, discs and more. Also, the device may cooperate with a web storage which performs a storage function of the memory through a network connection.
  • the memory 42 may be an internal storage unit of the 3D reconstruction device 400 , such as a hard disk or memory of the 3D reconstruction device 400 .
  • the memory 42 can also be an external storage device of the three-dimensional reconstruction device 400, such as a plug-in hard disk equipped on the three-dimensional reconstruction device 400, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) etc. Further, the memory 42 may also include both an internal storage unit of the three-dimensional reconstruction apparatus 400 and an external storage device. Memory 42 is used to store computer programs (or executable instructions) and data. The memory 42 can also be used to temporarily store data that has been output or will be output.
  • the processor 41 is used for:
  • the image of the target scene is collected by a camera arranged in the movable platform; and, the target scene is collected by a point cloud collection device arranged in the movable platform.
  • the processor 41 is further configured to: generate a second point cloud according to the image and the first point cloud; wherein, the density of the second point cloud is higher than that of the first point cloud. Density: perform three-dimensional reconstruction according to the first point cloud, the second point cloud and the image, and generate a three-dimensional model of the target scene in real time.
  • the processor 41 is further configured to: perform pseudo-color transformation and/or texture transformation on the three-dimensional model according to the depth of three-dimensional points in the three-dimensional model, and display the transformed three-dimensional model; wherein, Different depths of the three-dimensional points correspond to different colors and/or textures.
  • the processor 41 is further configured to: extract point cloud features from the first point cloud and the second point cloud, and extract image features from the image; according to the point cloud features and The image features are subjected to three-dimensional reconstruction.
  • the point cloud features are extracted by the point cloud extraction network in the pre-trained first neural network model; the image features are extracted by the image extraction network in the first neural network model.
  • the processor 41 is further configured to: fuse the point cloud features and the image features in the 3D space where the second point cloud is located, and perform 3D reconstruction according to the fused features.
  • the 3D model is obtained by performing 3D reconstruction processing on the features obtained by fusing the image features and the point cloud features by the first fusion network in the pre-trained first neural network model.
  • the image features include at least one of the following: texture features, color features, shape features or edge features;
  • the point cloud features include at least one of the following: envelope information, distance information or position between three-dimensional points relationship information.
  • the processor 41 is further configured to: map the first point cloud to the two-dimensional space where the image is located to obtain a first depth map; obtain a second depth map according to the first depth map and the image. Two depth maps; wherein, the number of pixels with non-zero pixel values in the second depth map is greater than the number of pixels with non-zero pixel values in the first depth map; mapping the second depth map to the first In the three-dimensional space where the point cloud is located, the second point cloud is obtained.
  • the processor 41 is further configured to: extract depth features from the first depth map, and extract image features from the image; perform fusion processing on the depth features and the image features to obtain The second depth map.
  • the depth feature is extracted by a depth image extraction network in a pre-trained second neural network model; the image feature is extracted by an image extraction network in the second neural network model; the second The depth map is obtained by processing the features obtained by fusing the depth features and the image features by the second fusion network in the second neural network model.
  • the depth feature and the image feature are fused in the two-dimensional space where the image is located.
  • the three-dimensional model is obtained by performing three-dimensional reconstruction processing on the first point cloud, the second point cloud, and the image by a pre-trained first neural network model; the second point cloud corresponds to the first
  • the second depth map is obtained by processing the image and the first depth map corresponding to the first point cloud by a pre-trained second neural network model; wherein, the first neural network model and the second neural network model are based on multiple Task learning joint training; and/or, the first neural network model and the second neural network model share an image extraction network for extracting image features.
  • the training data of the first neural network model includes: an image sample, a first point cloud sample, a second point cloud sample obtained from the image sample and the point cloud sample, and a three-dimensional model label;
  • the training data of the second neural network model includes: the image sample, the first depth image sample and the second depth image label obtained by mapping from the first point cloud sample.
  • the loss function of the first neural network model includes a first loss function and a second loss function; wherein the first loss function is used to describe the image sample, the first point cloud sample and The difference between the 3D model predicted in the second sample and the 3D model label; the second loss function is used to describe the prediction from the image sample, the first point cloud sample and the second sample The distance difference between the obtained 3D model and the 3D model label.
  • the loss function of the second neural network model is used to describe the difference between the second depth map predicted from the image sample and the first depth map sample and the second depth map label.
  • the first loss function includes a mean square error function
  • the second loss function includes a chamfer distance function and/or an EMD distance function
  • the loss function of the second neural network model includes a mean square error function.
  • the image includes a single-frame image; the first point cloud includes a single-frame first point cloud.
  • the processor 41 is further configured to: generate a single-view three-dimensional model of the target scene in real time according to the single-frame image and the single-frame first point cloud.
  • Various implementations described herein can be implemented using a computer readable medium such as computer software, hardware, or any combination thereof.
  • the embodiments described herein can be implemented by using Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable Gate Arrays
  • processors controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein.
  • an embodiment such as a procedure or a function may be implemented with a separate software module that allows at least one function or operation to be performed.
  • Software codes can be implemented by a software application (or program)
  • FIG. 7 is only an example of the three-dimensional reconstruction device 400, and does not constitute a limitation to the three-dimensional reconstruction device 400. It may include more or less components than those shown in the figure, or combine certain components, or be different. Components, for example, devices may also include input and output devices, network access devices, buses, and so on.
  • the embodiment of the present application also provides a 3D reconstruction system, including a movable platform and the above-mentioned 3D reconstruction device; the movable platform is equipped with an imaging device and a point cloud acquisition device.
  • the movable platform is used to use the imaging device to collect images and the point cloud collection device to collect the first point cloud after moving to the target scene, and transmit the image and the first point cloud to the The three-dimensional reconstruction device described above.
  • the movable platform includes any one or more of the following: unmanned aerial vehicles, self-driving vehicles, unmanned ships or mobile robots;
  • the point cloud collection device includes any one or more of the following: laser Radar, millimeter wave radar or binocular vision sensor.
  • non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which are executable by a processor of an apparatus to perform the above method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a non-transitory computer-readable storage medium enabling the terminal to execute the above method when instructions in the storage medium are executed by a processor of the terminal.

Abstract

一种三维重建方法,包括:(S101)在可移动平台(11)移动至目标场景之后,通过设置在可移动平台(11)中的摄像头采集目标场景的图像;以及,通过设置在可移动平台(11)中的点云采集装置采集目标场景的第一点云;(S102)根据图像和第一点云,实时生成目标场景的三维模型。三维重建方法图像和第一点云两类数据互补,有利于提高三维重建的准确性,且实时生成目标场景的三维模型,能够满足某些场景下的实时性需求。还公开了三维重建装置(400)、系统及存储介质。

Description

三维重建方法、装置、系统及存储介质 技术领域
本申请涉及计算机视觉技术领域,具体而言,涉及一种三维重建方法、装置、系统及存储介质。
背景技术
三维重建是指对三维物体建立适合计算机表示和处理的数学模型,是在计算机环境下对其进行处理、操作和分析其性质的基础,也是在计算机中建立表达客观世界的虚拟现实的关键技术。
相关的三维重建方法有:从环境中采集多张图像,然后采用运动恢复结构(Structure from motion,SfM)或同时定位与制图(simultaneous localization and mapping,SLAM)等方法对多张图像进行处理,实现对环境进行建模。但是,图像从二维空间中反映的信息有限,仅基于图像重建得到的三维模型的重建效果不佳。
发明内容
有鉴于此,本申请的目的之一是提供一种三维重建方法、装置、系统及存储介质。
第一方面,本申请实施例提供了一种三维重建方法,包括:
在可移动平台移动至目标场景之后,通过设置在所述可移动平台中的摄像头采集目标场景的图像;以及,通过设置在所述可移动平台中的点云采集装置采集所述目标场景的第一点云;
根据所述图像和所述第一点云,实时生成所述目标场景的三维模型。
第二方面,本申请实施例提供了一种三维重建装置,包括:
一个或多个处理器;
用于存储所述处理器可执行指令的存储器;
其中,所述一个或多个处理器单独或者共同执行所述可执行指令,以执行第一方面所述的方法。
第三方面,本申请实施例提供了一种三维重建系统,包括可移动平台以及第二方面所述的三维重建装置;
所述可移动平台搭载有成像装置和点云采集装置;
所述可移动平台用于在移动至目标场景之后,利用所述成像装置采集图像以及利用所述点云采集装置采集第一点云,并将所述图像和所述第一点云传输给所述三维重建装置。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现第一方面所述的方法。
本申请实施例所提供的一种三维重建方法、装置、系统及存储介质,在可移动平台移动至目标场景之后,可以通过设置在所述可移动平台中的摄像头采集目标场景的图像,以及通过设置在所述可移动平台中的点云采集装置采集所述目标场景的第一点云;进而综合图像和第一点云两类数据,实时生成所述目标场景的三维模型。本实施例图像和第一点云两类数据互补,有利于提高三维重建的准确性,且能够在采集到图像和第一点云之后,实时生成目标场景的三维模型,能够满足某些场景下的实时性需求。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种三维重建系统的示意图;
图2是本申请实施例提供的一种三维重建方法的流程示意图;
图3是本申请实施例提供的一种第二神经网络模型的结构示意图;
图4是本申请实施例提供的一种第一神经网络模型的结构示意图;
图5是本申请实施例提供的一种第三神经网络模型的结构示意图;
图6是本申请实施例提供的利用单帧图像和单帧第一点云生成单视角的三维模型的示意图;
图7是本申请实施例提供的一种三维重建装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
基于相关技术中的问题,本申请实施例提供了一种三维重建方法,在可移动平台移动至目标场景之后,可以通过设置在所述可移动平台中的摄像头采集目标场景的图像,以及通过设置在所述可移动平台中的点云采集装置采集所述目标场景的第一点云;进而综合图像和第一点云两类数据,实时生成所述目标场景的三维模型。本实施例实现图像和第一点云两类数据互补,有利于提高三维重建的准确性,且能够在采集到图像和第一点云之后,实时生成目标场景的三维模型,能够满足某些场景下的实时性需求。
在一些实施例中,所述三维重建方法可由三维重建装置来执行,所述三维重建装置可以是具有数据处理能力的电子设备,如电脑、服务器、云端服务器或者终端、可移动平台等;或者,所述三维重建装置也可以是具有数据处理能力的计算机芯片或者集成电路,例如中央处理单元(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)或者现成可编程门阵列(Field-Programmable Gate Array,FPGA)等;或者,所述三维重建装置还可以是集成于电子设备中的程序产品。
示例性的,当所述三维地图处理装置为具有数据处理能力的计算机芯片或者集成电路时,所述三维地图处理装置可以安装在遥控设备中。所述遥控设备与可移动平台通信连接,用于对可移动平台进行控制,比如控制可移动平台运动、或者控制可移动平台中的成像装置拍照等等;以及可移动平台可以将其采集的图像或者其他数据发送给遥控设备,以便在遥控设备中显示或者由遥控设备进行进一步地处理。所述可移动平台包括但不限于无人飞行器、车辆、无人驾驶船只或者移动机器人(比如扫地机器人)等等。
在一示例性的应用场景中,请参阅图1,以三维重建装置为可移动平台的遥控设备为例,图1中以可移动平台为无人飞行器进行示例说明。可移动平台11与遥控设备12通信连接,其中,可移动平台11搭载有成像装置和点云采集装置。在需要对某一目标场景进行三维建模时,可以将可移动平台11放置于该目标场景中,或者通过遥控 设备12控制可移动平台11运动至该目标场景。则位于目标场景中的可移动平台11可以利用其搭载的成像装置采集图像以及利用其搭载的点云采集装置采集第一点云,然后将采集的图像和第一点云传输给遥控设备12,遥控设备12可以执行本申请实施例提供的三维重建方法,利用图像和第一点云进行三维重建,从而获得目标场景的三维模型。其中,成像装置的视场和点云采集装置的探测范围部分或全部重叠。
其中,成像装置例如可以是照相机或摄像机等用于捕获图像的设备,成像装置可以在遥控设备或者可移动平台的控制下进行拍摄。成像装置至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器。示例性的,所述成像装置包括但不限于可见光相机、灰度相机以及红外相机等,所述图像可以是彩色图像、灰度图像或者红外图像等。
其中,点云采集装置包括但不限于激光雷达、毫米波雷达、双目视觉传感器或者结构光深度相机等。
激光雷达用于向目标场景发射激光脉冲序列,然后接收从目标反射回来的激光脉冲序列,并根据反射回来的激光脉冲序列生成三维点云。在一个例子中,所述激光雷达可以确定反射回来的激光脉冲序列的接收时间,例如,通过探测电信号脉冲的上升沿时间和/或下降沿时间确定激光脉冲序列的接收时间。如此,所述激光雷达可以利用激光脉冲序列的接收时间信息和发射时间计算TOF(Time of flight,飞行时间),从而确定探测物到所述激光雷达的距离。所述激光雷达属于自主发光的传感器,不依赖于光源光照,受环境光干扰比较小,即使在无光封闭环境内也可以正常工作,以便后续生成高精度的三维模型,具有广泛的适用性。毫米波雷达的点云采集原理与激光雷达类似,此处不再赘述。
双目视觉传感器是基于视差原理从不同的位置获取目标场景的两幅图像,通过计算两幅图像对应点间的位置偏差,来获取三维几何信息,以此生成三维点云。双目视觉传感器对于硬件要求低,相应的,也可以降低成本,只需是普通的CMOS(Complementary Metal Oxide Semiconductor,互补金属氧化物半导体)相机即可,只要光线合适,室内环境和室外环境均可使用,因此也具有一定的适用性。
结构光深度相机是将具有一定结构特征的光线投射到目标场景中再进行采集,这种具备一定结构的光线,会因被摄物体的不同深度区域而采集不同的图像相位信息,然后将其换算成深度信息,以此来获得三维点云。结构光深度相机也是属于自主发光的传感器,不依赖于光源光照,受环境光干扰比较小,即使在无光封闭环境内也可以 正常工作,以便后续生成高精度的三维模型,具有广泛的适用性。
示例性的,本申请实施例生成的目标场景的三维模型可应用于各种不同的领域中,可依据选定的目标场景进行具体设置。比如所述三维模型可应用于虚拟现实(Virtual Reality),增强现实(Augmented Reality),自动驾驶,高精度地图、测绘、地质勘测、建筑设计或者风景表现等领域。
接下来对本申请实施例提供的三维重建过程进行说明,请参阅图2,图2为本申请实施例提供的一种三维重建方法的流程示意图,所述方法可由三维重建装置来执行。所述三维重建装置与可移动平台通信连接,示例性的,所述三维重建装置为可移动平台的遥控设备。所述方法包括:
在步骤S101中,在可移动平台移动至目标场景之后,通过设置在所述可移动平台中的摄像头采集目标场景的图像;以及,通过设置在所述可移动平台中的点云采集装置采集所述目标场景的第一点云。
在步骤S102中,根据所述图像和所述第一点云,实时生成所述目标场景的三维模型。
可以理解的是,本申请对于所述目标场景不做任何限制,可依据实际应用场景进行具体设置。示例性的,所述目标场景可以是室内场景,也可以是室外场景;比如利用从室内场景采集的图像和第一点云生成的三维模型可提供给用户进行虚拟现实、增强现实等交互;比如可以获取大尺度的室外场景的图像和第一点云,进而基于本申请实施例提供的三维重建方法可以生成具有较高精度的三维模型,解决室外大尺度环境的三维重建问题。示例性的,所述目标场景为车辆驾驶场景,所述三维重建装置安装在车辆上,所述图像为车辆利用其搭载的成像装置采集得到,以及所述第一点云为车辆利用其搭载的点云采集装置(比如激光雷达)采集得到,则车辆中的三维重建装置利用从车辆驾驶场景采集的图像和第一点云生成三维模型,该三维模型可用于辅助车辆自动驾驶决策。
示例性的,由位于目标场景中的可移动平台利用其搭载的成像装置采集所述图像以及利用其搭载的点云采集装置采集所述第一点云之后,可移动平台可以将采集的图像和第一点云传输给三维重建装置,由三维重建装置根据所述图像和所述第一点云进行目标场景的三维重建。示例性的,所述三维重建装置为所述可移动平台的遥控设备,所述可移动平台安装有摄像头和激光雷达,可以通过遥控设备控制可移动平台在待建模的场景中运动,当控制所述可移动平台移动至所述目标场景后,通过所述摄像头采集所述目标场景的RGB图像,以及通过激光雷达采集所述目标场景的第一点云。
第一点云包括有若干三维点,这些三维点可以用来代表一个物体的外表面形状。另外,所述三维点还可以包括该三维点的深度值、分割结果等信息。而图像可以提供物体的颜色、纹理等特征。
在一些实施例中,考虑到通常使用点云采集装置得到的第一点云中三维点的数量比较少,三维点与三维点之间的间距比较大,即第一点云是稀疏点云。在获取从目标场景采集的图像和第一点云之后,三维重建装置可以根据图像和稀疏的第一点云来生成稠密的第二点云,即第二点云中三维点的数量较多,三维点与三维点之间的间距也比较小;换句话说,所述第二点云的密度高于所述第一点云的密度;进而三维重建装置根据稠密的第二点云实时输出所述目标场景的三维模型。
在另一些实施例中,考虑到第二点云由图像和第一点云加工处理得到,其相较于真实的目标场景可能存在一定误差,因此,三维重建装置在根据图像和稀疏的第一点云来生成稠密的第二点云之后,可以进一步对第二点云进行优化,根据所述第一点云、所述第二点云和所述图像进行三维重建,实时生成所述目标场景的三维模型。本实施例中考虑到图像和第一点云是从目标场景采集的真实数据,在重建过程中引入图像和第一点云,实现为第二点云提供误差补偿,因此有利于提高生成的三维模型的精度。
在一种可能的实施方式中,在获取第二点云的过程中,首先可以将所述第一点云映射到所述图像所在二维空间,获得第一深度图,由于第一点云是稀疏点云,则映射得到第一深度图中可能有些像素对应有深度值(即像素值非0),而有些像素没有深度值(即像素值为0)。示例性的,可以预先确定成像装置和点云采集装置(比如激光雷达)之间的相对位姿,根据所述相对位姿和成像装置的内参确定图像空间和点云空间之间的转换关系,进而可以利用所述转换关系将所述第一点云映射到所述图像所在二维空间,获得稀疏的第一深度图。
在获得第一深度图之后,考虑到目标场景的深度分布与图像的特征分布有着很强的关联性,图像中在同一目标对象上的像素点往往深度值是相似或者是相近的。依据这一特性,三维重建装置可以根据所述第一深度图和所述图像来获取第二深度图,比如将所述第一深度图和所述图像进行融合,获得与图像中的像素对应的第二深度图。进而将所述第二深度图映射到所述第一点云所在三维空间,获取所述第二点云。本实施例实现利用图像的特征来对稀疏的第一深度图进行深度补全,获取稠密的第二深度图,所述第二深度图中像素值非零(即对应有深度值)的像素数量多于所述第一深度图中像素值非零(即对应有深度值)的像素数量,进而可以通过二维空间到三维空间的映射得到稠密的第二点云。
在一可能的实施方式中,三维重建装置可以从所述第一深度图中提取深度特征,以及从所述图像中提取图像特征;然后对所述深度特征和所述图像特征进行融合处理,比如可以在所述图像所在二维空间中融合深度特征和图像特征,从而获得所述第二深度图。其中,所述图像特征包括但不限于纹理特征、颜色特征、形状特征或者边缘特征等等。本实施例中,利用图像的图像特征来对稀疏的第一深度图进行深度补全,有利于提高后续生成的三维模型的精度。
示例性的,可以通过深度学习的方法,利用预先训练的第二神经网络模型200对所述第一深度图进行自动补全。请参阅图3,所述第二神经网络模型200包括有深度图提取网络10,图像提取网络20和第二融合网络30。将所述第一深度图和所述图像输入第二神经网络模型200中,可以由所述深度图提取网络10来对第一深度图进行特征提取,得到深度特征;以及可以由图像提取网络20来对图像进行特征提取,得到图像特征;进而所述第二融合网络30可以将深度特征和图像特征进行融合处理,比如可以在所述图像所在二维空间中沿着通道维将深度特征和图像特征进行串联得到融合特征,并对融合特征进行进一步地处理获得所述第二深度图。
在一个例子中,所述深度图提取网络至少包括一个或多个卷积层,以实现对第一深度图进行特征提取;所述图像提取网络至少包括一个或多个卷积层,以实现对图像进行特征提取。
在一个例子中,可以基于有监督学习方式训练第二神经网络模型,比如可以获得如下训练数据:图像样本、由第一点云样本映射得到的第一深度图样本和第二深度图标签。在训练过程中,可以将图像样本和第一深度图样本输入第二神经网络模型中,由第二神经网络模型利用图像样本对第一深度图样本进行深度补全,获取预测的第二深度图;进而可以根据预测的第二深度图和所述第二深度图标签之间的差异调整第二神经网络模型的参数;比如可以利用预测的第二深度图与所述第二深度图标签之间的差异计算所述第二神经网络模型的损失函数,根据计算得到的损失值调整第二神经网络模型的参数,获得训练好的第二神经网络模型。可以理解的是,本申请实施例对于所述损失函数的具体类型不做任何限制,可依据实际应用场景进行具体设置,比如所述第二神经网络模型的损失函数包括均方误差函数。
在进行深度补全以获得与图像中的像素对应的第二深度图之后,三维重建装置可以将所述第二深度图映射到所述第一点云所在三维空间,获取所述第二点云。示例性的,可以预先确定成像装置和激光雷达之间的相对位姿,根据所述相对位姿和成像装置的内参确定图像空间和点云空间之间的转换关系,进而可以利用所述转换关系将所 述第二深度图映射到所述第一点云所在三维空间,获得稠密的第二点云。
在一些实施例中,在根据图像和第一点云生成第二点云之后,考虑到第二点云是图像和第一点云加工处理得到,其相较于真实的目标场景可能存在一定误差,即获得的第二点云是粗糙的稠密的点云,需进行进一步地精细化处理,以提高三维模型的精度。而考虑到图像和第一点云是从目标场景采集的真实数据,因此三维重建装置可以引入图像和第一点云来对第二点云进行优化,根据所述第一点云、所述第二点云和所述图像进行三维重建,实时生成所述目标场景的三维模型,图像和第一点云为第二点云提供误差补偿,有利于提高三维模型的精度。
在一种可能实施方式中,三维重建装置可以将第一点云、所述第二点云和所述图像进行融合,实时生成所述目标场景的三维模型。比如可以分别从所述第一点云和所述第二点云中提取点云特征,以及从所述图像中提取图像特征;其中,所述图像特征包括以下至少一种:纹理特征、颜色特征、形状特征或者边缘特征;所述点云特征包括以下至少一种:包络信息、三维点之间的距离信息或者位置关系信息;然后根据所述点云特征和所述图像特征进行三维重建,比如将所述点云特征和所述图像特征进行融合,并利用融合后的特征进行三维重建。本实施例中,分别从第一点云、第二点云和图像中提取有效特征进行重建处理,无需融合所有数据,在为重建处理提供丰富特征以提高三维模型精度的基础上,也减少了重建处理过程中的数据量,有利于提高重建处理效率。
示例性的,在将所述点云特征和所述图像特征进行融合时,为了提高三维重建效率,可以在所述第二点云所在三维空间中融合所述点云特征和所述图像特征,则融合后的特征也属于三维特征,利用融合后的特征进行三维重建,无需再进行其他维度转换过程,从而有利于提高三维重建效率。
示例性的,可以通过深度学习的方法,利用预先训练的第一神经网络模型100来进行三维重建。请参阅图4,所述第一神经网络模型100包括有点云提取网络40、图像提取网络20和第一融合网络50;将所述第一点云、所述第二点云和所述图像输入第一神经网络模型100中,可以由所述点云提取网络40来对第一点云和第二点云进行特征提取,得到点云特征,其中,所述点云特征包括但不限于包络信息、三维点之间的距离信息或者位置关系信息等;以及可以由图像提取网络20来对图像进行特征提取,得到图像特征,其中,所述图像特征包括但不限于纹理特征、颜色特征、形状特征或者边缘特征等;进而所述第一融合网络50可以在所述第二点云所在三维空间中融合所述点云特征和所述图像特征,并根据融合后的特征进行三维重建。
在一个例子中,所述点云提取网络至少包括一个或多个卷积层,以实现对第一点云和第二点云进行特征提取;所述图像提取网络至少包括一个或多个卷积层,以实现对图像进行特征提取。
在一个例子中,可以基于有监督学习方式训练第一神经网络模型,比如可以获得如下训练数据:图像样本、第一点云样本、由所述图像样本和所述点云样本得到的第二点云样本以及三维模型标签。
在训练过程中,可以将图像样本、第一点云样本和第二点云样本输入第一神经网络模型中,由第一神经网络模型利用输入的数据进行三维重建,获得预测的三维模型;进而可以根据预测的三维模型和所述三维模型标签之间的差异调整第一神经网络模型的参数。示例性的,可以设置所述第一神经网络模型的损失函数包括第一损失函数和第二损失函数;其中,所述第一损失函数用于描述从所述图像样本、所述第一点云样本和所述第二样本中预测得到的三维模型与所述三维模型标签之间的差异;第二损失函数用于描述从所述图像样本、所述第一点云样本和所述第二样本中预测得到的三维模型与所述三维模型标签之间的距离差异;进而可以根据第一损失函数的损失值和第二损失函数的损失值调整第一神经网络模型的参数,获得训练好的第一神经网络模型。可以理解的是,本申请实施例对于所述损失函数的具体类型不做任何限制,可依据实际应用场景进行具体设置,比如所述第一损失函数包括均方误差函数;所述第二损失函数包括倒角距离函数和/或EMD距离函数。
在一些实施例中,考虑到第一神经网络模型和第二神经网络模型所应用的数据之间的联系,为了提高训练效率和训练精度,可以基于多任务学习联合训练所述第一神经网络模型和第二神经网络模型。在训练过程中,利用所述第一神经网络模型和所述第二神经网络模型的损失函数共同调整所述第一神经网络模型和第二神经网络模型的参数。
在一些实施例中,考虑到第一神经网络模型和第二神经网络模型中都具有图像提取网络,该图像提取网络的功能相同,且在实际应用过程中,针对于同一三维模型,第一神经网络模型的图像提取网络和第二神经网络模型的图像提取网络处理的图像相同,则为了提高数据处理效率,精简网络结构,所述第一神经网络模型和所述第二神经网络模型可以共用用于提取图像特征的图像提取网络,即是说,在生成第二点云的过程中利用图像提取网络提取的图像特征可以用于三维模型的三维重建过程,实现对图像特征的复用,无需重复提取图像特征的步骤,有利于提高三维重建效率。
示例性的,基于第一神经网络模型100和所述第二神经网络模型200共用图像提 取网络20,如图5所示,提出了一种结合第一神经网络模型100和所述第二神经网络模型200的新神经网络模型(以下称为第三神经网络模型300)。第三神经网络模型300包括第一转换层60、第二转换层70、深度图提取网络10、图像提取网络20、点云提取网络40、第一融合网络50和第二融合网络30。所述第一转换层60用于将第一点云映射到所述图像所在二维空间,获得第一深度图;所述第一深度图提取网络10用于从所述第一深度图中提取深度特征;所述图像提取网络20用于从图像中提取图像特征;所述第二融合网络30用于对所述深度特征和所述图像特征进行融合处理,获得第二深度图;所述第二转换层70用于将第二深度图映射到第一点云所在三维空间,获取第二点云;所述点云提取网络40用于对从所述第一点云和所述第二点云中提取点云特征;所述第一融合网络50用于对融合所述图像特征和所述点云特征得到的特征进行三维重建处理,得到三维模型。
在一些实施例中,请参阅图6,可以获取从目标场景采集的单帧图像和单帧第一点云,利用单帧图像和单帧第一点云实时生成所述目标场景的单视角的三维模型。示例性的,利用单帧图像和单帧第一点云实时生成单帧第二点云,所述第二点云的密度高于所述第一点云的密度,基于单帧第二点云实时输出所述目标场景的单视角的三维模型。示例性的,为了提高三维模型精度,在获取单帧第二点云后,可以利用单帧图像和单帧第一点云进一步优化单帧第二点云,根据单帧第一点云、单帧第二点云和单帧图像进行三维重建,生成单视角的三维模型。本实施例实现基于单帧图像和单帧第一点云恢复出目标场景的单视角的三维结构。
在另一些实施例中,可以获取从目标场景采集的多帧图像和多帧第一点云,利用多帧图像和多帧第一点云实时生成所述目标场景的多视角的三维模型。示例性的,利用多帧图像和多帧第一点云实时生成多帧第二点云,所述第二点云的密度高于所述第一点云的密度,基于多帧第二点云实时输出所述目标场景的多视角的三维模型。示例性的,为了提高三维模型精度,在获取多帧第二点云后,可以利用多帧图像和多帧第一点云进一步优化多帧第二点云,根据多帧第一点云、多帧第二点云和多帧图像进行三维重建,生成多视角的三维模型。本实施例实现基于多帧图像和多帧第一点云恢复出目标场景的多视角的三维结构。
在一些实施例中,在需要显示三维模型的场景中,为了提高显示效果,还可以根据所述三维模型中的三维点的深度,对所述三维模型进行伪彩变换和纹理变换中的至少一种,并显示变换后的三维模型;其中,所述三维点的不同深度对应不同的颜色和/或纹理。示例性的,以灰度值0~255为例,比如可以设置三维点的深度与灰度值成负 相关关系,即三维点的深度越小,对应的灰度值越大,即越接近白色,反之,三维点的深度越大,对应的灰度值越小,即越接近黑色。示例性的,所述纹理可以根据所述图像中的RGB信息来确定,比如可以将所述图像中的RGB信息映射到所述三维模型的表面,形成纹理信息;或者所述纹理也可以是预设的纹理信息。
以上实施方式中的各种技术特征可以任意进行组合,只要特征之间的组合不存在冲突或矛盾,因此上述实施方式中的各种技术特征的任意进行组合也属于本说明书公开的范围。
相应地,请参阅图7,本申请实施例还提供了一种三维重建装置400,包括:
一个或多个处理器41;
用于存储所述处理器可执行指令的存储器42;
其中,所述一个或多个处理器41单独或者共同执行所述可执行指令,以执行上述任意一项所述的方法。
所述处理器41执行所述存储器42中包括的可执行指令,所述处理器41可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器42存储三维重建方法的可执行指令,所述存储器42可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,设备可以与通过网络连接执行存储器的存储功能的网络存储装置协作。存储器42可以是三维重建装置400的内部存储单元,例如三维重建装置400的硬盘或内存。存储器42也可以是三维重建装置400的外部存储设备,例如三维重建装置400上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器42还可以既包括三维重建装置400的内部存储单元也包括外部存储设备。存储器42用于存储计算机程序(或可执行指令)和数据。存储器42还可以用于暂时地存储已经输出或者将要输出的数据。
在一实施例中,所述处理器41用于:
在可移动平台移动至目标场景之后,通过设置在所述可移动平台中的摄像头采集所述目标场景的图像;以及,通过设置在所述可移动平台中的点云采集装置采集所述目标场景的第一点云;
根据所述图像和所述第一点云,实时生成所述目标场景的三维模型。
可选地,所述处理器41还用于:根据所述图像和所述第一点云,生成第二点云;其中,所述第二点云的密度高于所述第一点云的密度;根据所述第一点云、所述第二点云和所述图像进行三维重建,实时生成所述目标场景的三维模型。
可选地,所述处理器41还用于:根据所述三维模型中的三维点的深度,对所述三维模型进行伪彩变换和/或纹理变换,并显示变换后的三维模型;其中,所述三维点的不同深度对应不同的颜色和/或纹理。
可选地,所述处理器41还用于:从所述第一点云和所述第二点云中提取点云特征,以及从所述图像中提取图像特征;根据所述点云特征和所述图像特征进行三维重建。
可选地,所述点云特征由预先训练的第一神经网络模型中的点云提取网络提取得到;所述图像特征由所述第一神经网络模型中的图像提取网络提取得到。
可选地,所述处理器41还用于:在所述第二点云所在三维空间中融合所述点云特征和所述图像特征,并根据融合后的特征进行三维重建。
可选地,所述三维模型由预先训练的第一神经网络模型中的第一融合网络对融合所述图像特征和所述点云特征得到的特征进行三维重建处理得到。
可选地,所述图像特征包括以下至少一种:纹理特征、颜色特征、形状特征或者边缘特征;所述点云特征包括以下至少一种:包络信息、三维点之间的距离信息或者位置关系信息。
可选地,所述处理器41还用于:将所述第一点云映射到所述图像所在二维空间,获得第一深度图;根据所述第一深度图和所述图像,获取第二深度图;其中,所述第二深度图中像素值非零的像素数量多于所述第一深度图中像素值非零的像素数量;将所述第二深度图映射到所述第一点云所在三维空间,获取所述第二点云。
可选地,所述处理器41还用于:从所述第一深度图中提取深度特征,以及从所述图像中提取图像特征;对所述深度特征和所述图像特征进行融合处理,获得所述第二深度图。
可选地,所述深度特征由预先训练的第二神经网络模型中的深度图提取网络提取得到;所述图像特征由所述第二神经网络模型中的图像提取网络提取得到;所述第二 深度图由所述第二神经网络模型中的第二融合网络对融合所述深度特征和所述图像特征得到的特征进行处理得到。
可选地,所述深度特征和所述图像特征在所述图像所在二维空间中进行融合。
可选地,所述三维模型由预先训练的第一神经网络模型对所述第一点云、所述第二点云和所述图像进行三维重建处理得到;所述第二点云对应的第二深度图由预先训练的第二神经网络模型对所述图像和所述第一点云对应的第一深度图进行处理得到;其中,所述第一神经网络模型和第二神经网络模型基于多任务学习联合训练得到;和/或,所述第一神经网络模型和所述第二神经网络模型共用用于提取图像特征的图像提取网络。
可选地,所述第一神经网络模型的训练数据包括:图像样本、第一点云样本、由所述图像样本和所述点云样本得到的第二点云样本以及三维模型标签;所述第二神经网络模型的训练数据包括:所述图像样本、由所述第一点云样本映射得到的第一深度图样本和第二深度图标签。
可选地,所述第一神经网络模型的损失函数包括第一损失函数和第二损失函数;其中,所述第一损失函数用于描述从所述图像样本、所述第一点云样本和所述第二样本中预测得到的三维模型与所述三维模型标签之间的差异;第二损失函数用于描述从所述图像样本、所述第一点云样本和所述第二样本中预测得到的三维模型与所述三维模型标签之间的距离差异。所述第二神经网络模型的损失函数用于描述从所述图像样本和所述第一深度图样本中预测得到的第二深度图与所述第二深度图标签之间的差异。示例性的,所述第一损失函数包括均方误差函数;所述第二损失函数包括倒角距离函数和/或EMD距离函数;所述第二神经网络模型的损失函数包括均方误差函数。
可选地,所述图像包括单帧图像;所述第一点云包括单帧第一点云。所述处理器41还用于:根据所述单帧图像和所述单帧第一点云,实时生成所述目标场景的单视角的三维模型。
这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在 存储器中并且由控制器执行。
本领域技术人员可以理解,图7仅仅是三维重建装置400的示例,并不构成对三维重建装置400的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如设备还可以包括输入输出设备、网络接入设备、总线等。
上述设备中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
在一些实施例中,请参阅图1,本申请实施例还提供了一种三维重建系统,包括可移动平台以及上述的三维重建装置;所述可移动平台搭载有成像装置和点云采集装置。
所述可移动平台用于在移动至目标场景之后,利用所述成像装置采集图像以及利用所述点云采集装置采集第一点云,并将所述图像和所述第一点云传输给所述三维重建装置。
可选地,所述可移动平台包括以下任一种或多种:无人飞行器、自动驾驶车辆、无人驾驶船只或者移动机器人;所述点云采集装置包括以下任一种或多种:激光雷达、毫米波雷达或者双目视觉传感器。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由装置的处理器执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当存储介质中的指令由终端的处理器执行时,使得终端能够执行上述方法。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申 请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (21)

  1. 一种三维重建方法,其特征在于,包括:
    在可移动平台移动至目标场景之后,通过设置在所述可移动平台中的摄像头采集所述目标场景的图像;以及,通过设置在所述可移动平台中的点云采集装置采集所述目标场景的第一点云;
    根据所述图像和所述第一点云,实时生成所述目标场景的三维模型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一图像和所述第一点云,实时生成所述目标场景的三维模型,包括:
    根据所述图像和所述第一点云,生成第二点云;其中,所述第二点云的密度高于所述第一点云的密度;
    根据所述第一点云、所述第二点云和所述图像进行三维重建,实时生成所述目标场景的三维模型。
  3. 根据权利要求1所述的方法,其特征在于,还包括:
    根据所述三维模型中的三维点的深度,对所述三维模型进行伪彩变换和/或纹理变换,并显示变换后的三维模型;其中,所述三维点的不同深度对应不同的颜色和/或纹理。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述第一点云、所述第二点云和所述图像进行三维重建,包括:
    从所述第一点云和所述第二点云中提取点云特征,以及从所述图像中提取图像特征;
    根据所述点云特征和所述图像特征进行三维重建。
  5. 根据权利要求4所述的方法,其特征在于,所述点云特征由预先训练的第一神经网络模型中的点云提取网络提取得到;
    所述图像特征由所述第一神经网络模型中的图像提取网络提取得到。
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述点云特征和所述图像特征进行三维重建,包括:
    在所述第二点云所在三维空间中融合所述点云特征和所述图像特征,并根据融合后的特征进行三维重建。
  7. 根据权利要求6所述的方法,其特征在于,所述三维模型由预先训练的第一神经网络模型中的第一融合网络对融合所述图像特征和所述点云特征得到的特征进行三维重建处理得到。
  8. 根据权利要求4至7任意一项所述的方法,其特征在于,所述图像特征包括以下至少一种:纹理特征、颜色特征、形状特征或者边缘特征;
    所述点云特征包括以下至少一种:包络信息、三维点之间的距离信息或者位置关系信息。
  9. 根据权利要求2至7任意一项所述的方法,其特征在于,所述根据所述图像和第一点云,生成第二点云,包括:
    将所述第一点云映射到所述图像所在二维空间,获得第一深度图;
    根据所述第一深度图和所述图像,获取第二深度图;其中,所述第二深度图中像素值非零的像素数量多于所述第一深度图中像素值非零的像素数量;
    将所述第二深度图映射到所述第一点云所在三维空间,获取所述第二点云。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述第一深度图和所述图像,获取第二深度图,包括:
    从所述第一深度图中提取深度特征,以及从所述图像中提取图像特征;
    对所述深度特征和所述图像特征进行融合处理,获得所述第二深度图。
  11. 根据权利要求10所述的方法,其特征在于,所述深度特征由预先训练的第二神经网络模型中的深度图提取网络提取得到;
    所述图像特征由所述第二神经网络模型中的图像提取网络提取得到;
    所述第二深度图由所述第二神经网络模型中的第二融合网络对融合所述深度特征和所述图像特征得到的特征进行处理得到。
  12. 根据权利要求10或11所述的方法,其特征在于,所述深度特征和所述图像特征在所述图像所在二维空间中进行融合。
  13. 根据权利要求2至12任意一项所述的方法,其特征在于,所述三维模型由预先训练的第一神经网络模型对所述第一点云、所述第二点云和所述图像进行三维重建处理得到;
    所述第二点云对应的第二深度图由预先训练的第二神经网络模型对所述图像和所述第一点云对应的第一深度图进行处理得到;
    其中,所述第一神经网络模型和第二神经网络模型基于多任务学习联合训练得到;和/或,所述第一神经网络模型和所述第二神经网络模型共用用于提取图像特征的图像提取网络。
  14. 根据权利要求13所述的方法,其特征在于,所述第一神经网络模型的训练数据包括:图像样本、第一点云样本、由所述图像样本和所述点云样本得到的第二点云 样本以及三维模型标签;
    所述第二神经网络模型的训练数据包括:所述图像样本、由所述第一点云样本映射得到的第一深度图样本和第二深度图标签。
  15. 根据权利要求14所述的方法,其特征在于,所述第一神经网络模型的损失函数包括第一损失函数和第二损失函数;
    其中,所述第一损失函数用于描述从所述图像样本、所述第一点云样本和所述第二样本中预测得到的三维模型与所述三维模型标签之间的差异;
    第二损失函数用于描述从所述图像样本、所述第一点云样本和所述第二样本中预测得到的三维模型与所述三维模型标签之间的距离差异;
    所述第二神经网络模型的损失函数用于描述从所述图像样本和所述第一深度图样本中预测得到的第二深度图与所述第二深度图标签之间的差异。
  16. 根据权利要求15所述的方法,其特征在于,所述第一损失函数包括均方误差函数;所述第二损失函数包括倒角距离函数和/或EMD距离函数;
    所述第二神经网络模型的损失函数包括均方误差函数。
  17. 根据权利要求1至16任意一项所述的方法,其特征在于,所述图像包括单帧图像;所述第一点云包括单帧第一点云;
    所述根据所述第一图像和所述第一点云,实时生成所述目标场景的三维模型,包括:
    根据所述单帧图像和所述单帧第一点云,实时生成所述目标场景的单视角的三维模型。
  18. 一种三维重建装置,其特征在于,包括:
    一个或多个处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述一个或多个处理器单独或者共同执行所述可执行指令,以执行如权利要求1至17任意一项所述的方法。
  19. 一种三维重建系统,其特征在于,包括可移动平台以及如权利要求18所述的三维重建装置;
    所述可移动平台搭载有成像装置和点云采集装置;
    所述可移动平台用于在移动至目标场景之后,利用所述成像装置采集图像以及利 用所述点云采集装置采集第一点云,并将所述图像和所述第一点云传输给所述三维重建装置。
  20. 根据权利要求19所述的系统,其特征在于,所述可移动平台包括以下任一种或多种:无人飞行器、自动驾驶车辆、无人驾驶船只或者移动机器人;
    所述点云采集装置包括以下任一种或多种:激光雷达、毫米波雷达或者双目视觉传感器。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现如权利要求1至17任一项所述的方法。
PCT/CN2022/078878 2022-03-02 2022-03-02 三维重建方法、装置、系统及存储介质 WO2023164845A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/078878 WO2023164845A1 (zh) 2022-03-02 2022-03-02 三维重建方法、装置、系统及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/078878 WO2023164845A1 (zh) 2022-03-02 2022-03-02 三维重建方法、装置、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2023164845A1 true WO2023164845A1 (zh) 2023-09-07

Family

ID=87882799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078878 WO2023164845A1 (zh) 2022-03-02 2022-03-02 三维重建方法、装置、系统及存储介质

Country Status (1)

Country Link
WO (1) WO2023164845A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253013A (zh) * 2023-11-07 2023-12-19 中国科学院空天信息创新研究院 基于协同感知的分布式三维重建方法
CN117456130A (zh) * 2023-12-22 2024-01-26 山东街景智能制造科技股份有限公司 一种场景模型构建方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107576960A (zh) * 2017-09-04 2018-01-12 苏州驾驶宝智能科技有限公司 视觉雷达时空信息融合的目标检测方法及系统
CN110160502A (zh) * 2018-10-12 2019-08-23 腾讯科技(深圳)有限公司 地图要素提取方法、装置及服务器
CN111563923A (zh) * 2020-07-15 2020-08-21 浙江大华技术股份有限公司 获得稠密深度图的方法及相关装置
CN113095154A (zh) * 2021-03-19 2021-07-09 西安交通大学 基于毫米波雷达与单目相机的三维目标检测系统及方法
CN113160327A (zh) * 2021-04-09 2021-07-23 上海智蕙林医疗科技有限公司 一种点云补全的实现方法和系统
CN113192182A (zh) * 2021-04-29 2021-07-30 山东产研信息与人工智能融合研究院有限公司 一种基于多传感器的实景重建方法及系统
US11099275B1 (en) * 2020-04-29 2021-08-24 Tsinghua University LiDAR point cloud reflection intensity complementation method and system
CN113724379A (zh) * 2021-07-08 2021-11-30 中国科学院空天信息创新研究院 三维重建方法、装置、设备及存储介质
CN113906481A (zh) * 2020-10-13 2022-01-07 深圳市大疆创新科技有限公司 成像显示方法、遥控终端、装置、系统及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107576960A (zh) * 2017-09-04 2018-01-12 苏州驾驶宝智能科技有限公司 视觉雷达时空信息融合的目标检测方法及系统
CN110160502A (zh) * 2018-10-12 2019-08-23 腾讯科技(深圳)有限公司 地图要素提取方法、装置及服务器
US11099275B1 (en) * 2020-04-29 2021-08-24 Tsinghua University LiDAR point cloud reflection intensity complementation method and system
CN111563923A (zh) * 2020-07-15 2020-08-21 浙江大华技术股份有限公司 获得稠密深度图的方法及相关装置
CN113906481A (zh) * 2020-10-13 2022-01-07 深圳市大疆创新科技有限公司 成像显示方法、遥控终端、装置、系统及存储介质
CN113095154A (zh) * 2021-03-19 2021-07-09 西安交通大学 基于毫米波雷达与单目相机的三维目标检测系统及方法
CN113160327A (zh) * 2021-04-09 2021-07-23 上海智蕙林医疗科技有限公司 一种点云补全的实现方法和系统
CN113192182A (zh) * 2021-04-29 2021-07-30 山东产研信息与人工智能融合研究院有限公司 一种基于多传感器的实景重建方法及系统
CN113724379A (zh) * 2021-07-08 2021-11-30 中国科学院空天信息创新研究院 三维重建方法、装置、设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253013A (zh) * 2023-11-07 2023-12-19 中国科学院空天信息创新研究院 基于协同感知的分布式三维重建方法
CN117253013B (zh) * 2023-11-07 2024-02-23 中国科学院空天信息创新研究院 基于协同感知的分布式三维重建方法
CN117456130A (zh) * 2023-12-22 2024-01-26 山东街景智能制造科技股份有限公司 一种场景模型构建方法
CN117456130B (zh) * 2023-12-22 2024-03-01 山东街景智能制造科技股份有限公司 一种场景模型构建方法

Similar Documents

Publication Publication Date Title
WO2021233029A1 (en) Simultaneous localization and mapping method, device, system and storage medium
CN112132972B (zh) 一种激光与图像数据融合的三维重建方法及系统
US20210390329A1 (en) Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium
WO2020139511A1 (en) Crowdsourced detection, identification and sharing of hazardous road objects in hd maps
CN110176032B (zh) 一种三维重建方法及装置
WO2023164845A1 (zh) 三维重建方法、装置、系统及存储介质
CN113362247B (zh) 一种激光融合多目相机的语义实景三维重建方法及系统
CN108789421B (zh) 基于云平台的云机器人交互方法和云机器人及云平台
WO2019089039A1 (en) Aperture supervision for single-view depth prediction
CN112991534B (zh) 一种基于多粒度物体模型的室内语义地图构建方法及系统
WO2022206414A1 (zh) 三维目标检测方法及装置
CN116194951A (zh) 用于基于立体视觉的3d对象检测与分割的方法和装置
CN114519772A (zh) 一种基于稀疏点云和代价聚合的三维重建方法及系统
CN112258631B (zh) 一种基于深度神经网络的三维目标检测方法及系统
CN115965961B (zh) 局部到全局的多模态融合方法、系统、设备及存储介质
CN113378605A (zh) 多源信息融合方法及装置、电子设备和存储介质
CN115359067A (zh) 一种基于连续卷积网络的逐点融合点云语义分割方法
CN116136408A (zh) 室内导航方法、服务器、装置和终端
CN111890358A (zh) 双目避障方法、装置、存储介质及电子装置
AU2017300877B2 (en) Method and device for aiding the navigation of a vehicle
Motayyeb et al. Enhancing contrast of images to improve geometric accuracy of a UAV photogrammetry project
CN114332187B (zh) 单目目标测距方法及装置
TWI787141B (zh) 深度估計模型訓練方法、深度估計方法及電子設備
Chu et al. Image to Lidar Registration Using Image Feature Matching
Li Visual navigation in unmanned air vehicles with simultaneous location and mapping (SLAM)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22929296

Country of ref document: EP

Kind code of ref document: A1