WO2022151661A1 - 一种三维重建方法、装置、设备及存储介质 - Google Patents

一种三维重建方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022151661A1
WO2022151661A1 PCT/CN2021/102117 CN2021102117W WO2022151661A1 WO 2022151661 A1 WO2022151661 A1 WO 2022151661A1 CN 2021102117 W CN2021102117 W CN 2021102117W WO 2022151661 A1 WO2022151661 A1 WO 2022151661A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstruction
space
feature map
voxel
feature
Prior art date
Application number
PCT/CN2021/102117
Other languages
English (en)
French (fr)
Inventor
鲍虎军
周晓巍
孙佳明
谢一鸣
Original Assignee
浙江商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江商汤科技开发有限公司 filed Critical 浙江商汤科技开发有限公司
Priority to JP2022546566A priority Critical patent/JP7352748B2/ja
Priority to KR1020227026271A priority patent/KR20220120674A/ko
Publication of WO2022151661A1 publication Critical patent/WO2022151661A1/zh
Priority to US18/318,724 priority patent/US20230290099A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/36Level of detail
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling

Definitions

  • the present application relates to the field of computer vision technology, and in particular, to a three-dimensional reconstruction method, apparatus, device, and storage medium.
  • Embodiments of the present application provide a three-dimensional reconstruction method, apparatus, device, and storage medium.
  • An embodiment of the present application provides a three-dimensional reconstruction method, including: acquiring at least two frames of first key images for this reconstruction, and determining a first space surrounding a view cone of the at least two frames of the first key images; A key image is obtained by photographing the target to be reconstructed; based on the image information in at least two frames of the first key image, a first feature map of the first space is obtained, wherein the first feature map includes the first feature map of voxels in the first space feature information; based on the first feature map, the first reconstruction result of this reconstruction is obtained; based on the first reconstruction result of this reconstruction, the second reconstruction result obtained by the previous reconstruction is updated.
  • the first key image is obtained by photographing the target to be reconstructed, in On this basis, the first feature map of the first space is obtained based on the image information in at least two frames of the first key image, and the first feature map includes the first feature information of the voxels in the first space, so based on the first feature map , obtain the first reconstruction result of this reconstruction, and then update the second reconstruction result obtained from the previous reconstruction based on the first reconstruction result of this reconstruction.
  • 3D reconstruction is performed on the first space of the viewing cone of the key image as a whole, which can not only greatly reduce the computational load, but also reduce the probability of delamination or dispersion in the reconstruction results, thereby improving the real-time performance of the 3D reconstruction process and the smoothness of the 3D reconstruction results.
  • the method further includes: performing feature extraction on each frame of the first key image, respectively, to obtain a second feature map of the first key image; based on the at least two frames obtaining the first feature map of the first space from the image information in the first key image, including: obtaining the first feature of the first space based on the second feature information corresponding to each voxel of the first space in the second feature map picture.
  • the second feature map of the first key image is obtained, and based on the second feature information corresponding to each voxel in the first space in the second feature map, the first The first feature map of one space, so the second feature map of the first key image of each frame can be fused to obtain the first feature map of the first space, which is beneficial to improve the accuracy of the first feature map, and further helps to improve the three-dimensional Reconstruction accuracy.
  • obtaining the first feature map of the first space based on the second feature information corresponding to each voxel of the first space in the second feature map includes: from the second feature map of each frame of the first key image, respectively, extracting the second feature information corresponding to the voxels; fusing the second feature information of the voxels corresponding to at least two frames of the first key image respectively to obtain the first feature information of the voxels; based on the first feature information of each voxel in the first space feature information to obtain a first feature map of the first space.
  • the obtained The first feature information of the voxel so that the first feature map of the first space is obtained based on the first feature information of each voxel in the first space. Therefore, for each voxel in the first space, the corresponding The second feature information of the first key image of the frame can help to further improve the accuracy of the first feature map of the first space.
  • fusing the second feature information of the voxels corresponding to at least two frames of the first key images respectively to obtain the first feature information of the voxels includes at least one of the following:
  • the average value of the feature information is taken as the first feature information of the voxel; after extracting the second feature information corresponding to the voxel from the second feature map of the first key image of each frame, the method further includes: In the case where the second feature information corresponding to the voxel is not extracted in the second feature map of a key image, the preset feature information is used as the first feature information of the voxel.
  • the complexity of obtaining the first feature information can be reduced, thereby helping to improve the accuracy of the three-dimensional reconstruction. speed, which can further improve the real-time performance of the three-dimensional reconstruction process; and in the case where the second feature information corresponding to the voxel is not extracted from the second feature map of the first key image of each frame, the preset feature information As the first feature information of the voxel, it can be beneficial to further reduce the complexity of acquiring the first feature information.
  • the second feature map of each frame of the first key image includes a preset number of second feature maps corresponding to different resolutions;
  • the first space includes a preset number of first spaces corresponding to different resolutions, and the higher the resolution , the smaller the size of the voxel in the first space;
  • the first feature map includes a preset number of first feature maps corresponding to different resolutions, and each first feature map is a second feature map based on a second feature map of the same resolution characteristic information is obtained.
  • the second feature maps of each frame of the first key image to include a preset number of second feature maps corresponding to different resolutions
  • the first space includes a preset number of first spaces corresponding to different resolutions
  • the first feature map is set to include a preset number of first feature maps corresponding to different resolutions, and each first feature map is based on the same resolution.
  • the second feature information of the second feature map with the highest rate can be obtained, so it is beneficial to perform 3D reconstruction by using a preset number of second feature maps of different resolutions, thereby further improving the fineness of the 3D reconstruction.
  • obtaining the first reconstruction result of this reconstruction includes: selecting one resolution as the current resolution in order of resolution from low to high;
  • the first reconstruction result is up-sampled, and the up-sampled first reconstruction result is fused with the first feature map corresponding to the current resolution to obtain a fusion feature map corresponding to the current resolution;
  • the first reconstruction result corresponding to the resolution if the current resolution is not the highest resolution, re-execute the steps of selecting a resolution as the current resolution and the subsequent steps in the order of resolution from low to high;
  • the first reconstruction result corresponding to the current resolution is used as the final first reconstruction result of this reconstruction.
  • the up-sampled first reconstruction result is The result is fused with the first feature map corresponding to the current resolution, and the fused feature map corresponding to the current resolution is obtained.
  • the resolution is not the highest resolution, re-execute the steps of selecting a resolution as the current resolution and subsequent steps in order of resolution from low to high, or if the current resolution is the highest resolution,
  • the first reconstruction result corresponding to the current resolution is used as the final first reconstruction result of this reconstruction, so it can be gradually performed from the first feature map based on "low resolution” to the first feature map based on "high resolution” 3D reconstruction, which can facilitate the realization of "coarse-to-fine" 3D reconstruction, which can further improve the fineness of 3D reconstruction.
  • obtaining the first reconstruction result of this reconstruction based on the first feature map includes: performing prediction based on the first feature map, and obtaining the first reconstructed value of each voxel in the first space and the first reconstructed value at a preset value The probability value within the range; wherein, the first reconstruction value is used to represent the distance between the voxel and the surface of the associated object in the target to be reconstructed; select the voxel whose probability value satisfies the preset condition in the first space; based on the selected volume The first reconstruction value of the pixel is obtained to obtain the first reconstruction result of this reconstruction.
  • the first reconstructed value of each voxel in the first space and the probability value of the first reconstructed value within the preset value range are obtained, and the first reconstructed value is used to represent the relationship between the voxel and the The distance between the surfaces of the associated objects in the target to be reconstructed, and select the voxels whose probability value satisfies the preset condition in the first space, so as to obtain the first reconstruction result of this reconstruction based on the first reconstruction value of the selected voxel , so the interference of the voxels whose probability value does not meet the preset condition on the 3D reconstruction can be filtered out, which can help to further improve the accuracy of the 3D reconstruction.
  • the first reconstruction result includes the first reconstruction value of the voxel in the first space
  • the second reconstruction result includes the second reconstruction value of the voxel in the second space
  • the second space is the view surrounding the previously reconstructed second key image.
  • the total space of the cone, the first reconstruction value and the second reconstruction value are used to represent the distance between the voxel and the surface of the associated object in the target to be reconstructed; Updating the reconstruction result includes: based on the first reconstruction value of the voxel in the first space, updating the second reconstruction value of the corresponding voxel in the second space.
  • the second reconstruction result is set to include the second reconstruction value of the voxels in the second space, and the second space is reconstructed before enclosing
  • the total space of the view frustum of the second key image, the first reconstruction value and the second reconstruction value are used to represent the distance between the voxel and the surface of the associated object in the target to be reconstructed, on this basis, based on the volume in the first space
  • the first reconstruction value of the voxel is updated, and the second reconstruction value of the corresponding voxel in the second space is updated, so as to update the second reconstruction result obtained by the previous reconstruction, which is beneficial to the 3D reconstruction process based on the first reconstruction process in this reconstruction process.
  • the first reconstruction value of the voxel in the space updates the second reconstruction result obtained by the previous reconstruction, which can help to continuously improve the second reconstruction result in the reconstruction process and improve the accuracy of the three-dimensional reconstruction.
  • the associated object surface is the object surface that is closest to the voxel in the target to be reconstructed.
  • the associated object surface as the object surface with the closest distance to the voxel in the target to be reconstructed, it can be beneficial to further improve the accuracy of the three-dimensional reconstruction.
  • the first reconstruction result is obtained by using the 3D reconstruction model; based on the first feature map, the first reconstruction result of this reconstruction is obtained, including: obtaining the first historical hidden layer obtained by the previous reconstruction of the fusion network of the 3D reconstruction model state; wherein, the first historical hidden layer state includes the state value corresponding to the voxel in the second space, and the second space is the total space surrounding the view cone of the previously reconstructed second key image; from the first historical hidden layer state, Extract the state value corresponding to the voxel in the first space as the second historical hidden layer state; perform based on the fusion network: update the state value in the second historical hidden layer state based on the first feature map to obtain this hidden layer state; a three-dimensional reconstruction model is used to predict the current hidden layer state, and a first reconstruction result is obtained.
  • the first reconstruction result is obtained by using a three-dimensional reconstruction model, and obtaining the first historical hidden layer state obtained by the previous reconstruction of the fusion network of the three-dimensional reconstruction model, and the first historical hidden layer state includes the second space
  • the state value corresponding to the voxel in the middle, and the second space is the total space surrounding the view cone of the second key image reconstructed before.
  • the state corresponding to the voxel in the first space is extracted from the first historical hidden layer state.
  • the second historical hidden layer state so as to perform based on the fusion network: update the state value in the second historical hidden layer state based on the first feature map to obtain the current hidden layer state, and then use the three-dimensional reconstruction model to analyze the current state of the hidden layer.
  • the state of the second hidden layer is predicted to obtain the first reconstruction result. Therefore, in each reconstruction process, the first historical hidden layer state obtained by the previous reconstruction can be referred to, which can help to improve the consistency between this reconstruction and the previous reconstruction, so that it can be It is beneficial to reduce the probability of delamination or dispersion between the reconstruction result of this time and the previous reconstruction result, thereby further improving the smoothness of the three-dimensional reconstruction result.
  • the state value in the first historical hidden layer state is a preset state value.
  • the fusion network includes: a gated cyclic unit; the three-dimensional reconstruction model further includes a prediction network, which uses the three-dimensional reconstruction model to predict the current hidden layer state, and obtains the first reconstruction result, including: based on the prediction network for the current hidden layer state. Prediction, and get the first reconstruction result.
  • the fusion network to include the gated recurrent unit, it can be beneficial to introduce a selective attention mechanism through the gated recurrent unit, so that it can be beneficial to selectively refer to the first historical hidden information obtained by the previous reconstruction in the 3D reconstruction process.
  • the 3D reconstruction model By setting the 3D reconstruction model to include the prediction network, the current hidden layer state can be predicted based on the prediction network, and the first reconstruction result can be obtained, which can help to improve the 3D reconstruction. Efficiency of reconstruction.
  • the method before updating the state value in the second historical hidden layer state based on the first feature map to obtain the current hidden layer state, the method further includes: extracting geometric information from the first feature map to obtain a geometric feature map; wherein , the geometric feature map includes the geometric information of the voxels; the state value in the second historical hidden layer state is updated based on the first feature map to obtain the current hidden layer state, including: based on the geometric feature map of the second historical hidden layer state Update the state value in the current hidden layer state.
  • the geometric feature map is obtained by extracting geometric information from the first feature map, and the geometric feature map includes the geometric information of voxels.
  • the state value in the second historical hidden layer state is updated based on the geometric feature map.
  • the current hidden layer state can be obtained, and the second historical hidden layer state of the reconstructed first space can be updated on the basis of the extracted geometric information of the voxels, which is beneficial to improve the accuracy of the three-dimensional reconstruction.
  • the method further includes: based on the state value in the current hidden layer state, updating the first historical hidden layer state.
  • the state value corresponding to the corresponding voxel in the layer state is based on the state value in the current hidden layer state.
  • the first key images correspond to camera pose parameters
  • the camera pose parameters include translation distance and rotation angle
  • the first key image satisfies at least one of the following :
  • the difference in translation distance between adjacent first key images is greater than a preset distance threshold
  • the difference in rotation angle between adjacent first key images is greater than a preset angle threshold.
  • the first key image corresponds to camera pose parameters
  • the camera pose parameters include The translation distance and rotation angle
  • the first key image is set to satisfy at least one of the following: the difference in translation distance between adjacent first key images is greater than a preset distance threshold, and the difference in rotation angle between adjacent first key images is greater than a preset distance. Setting the angle threshold can help expand the visual range of the first space as much as possible on the basis of referring to as few key images as possible in each reconstruction process, thereby improving the efficiency of 3D reconstruction.
  • An embodiment of the present application provides a three-dimensional reconstruction device, including: a key image acquisition module, a first space determination module, a first feature acquisition module, a reconstruction result acquisition module, and a reconstruction result update module, where the key image acquisition module is configured to acquire At least two frames of the first key images reconstructed this time; the first space determination module is configured to determine a first space surrounding the viewing cone of the at least two frames of the first key images; wherein, the first key images are obtained by photographing the target to be reconstructed; The first feature acquisition module is configured to obtain a first feature map of a first space based on image information in at least two frames of the first key image, wherein the first feature map includes first feature information of voxels in the first space; reconstructing The result obtaining module is configured to obtain the first reconstruction result of this reconstruction based on the first feature map; the reconstruction result updating module is configured to update the second reconstruction result obtained by the previous reconstruction based on the first reconstruction result of this reconstruction.
  • An embodiment of the present application provides an electronic device, including a mutually coupled memory and a processor, where the processor is configured to execute program instructions stored in the memory, so as to implement the above three-dimensional reconstruction method.
  • An embodiment of the present application provides a computer-readable storage medium, on which program instructions are stored, and when the program instructions are executed by a processor, the foregoing three-dimensional reconstruction method is implemented.
  • the entire first space of the viewing cone of a key image can be reconstructed in 3D as a whole, which can not only greatly reduce the computational load, but also reduce the probability of layering or dispersion of the reconstruction results, thereby improving the real-time performance of the 3D reconstruction process and the accuracy of the 3D reconstruction results. smoothness.
  • FIG. 1A is a schematic flowchart of an embodiment of a three-dimensional reconstruction method according to an embodiment of the present application
  • FIG. 1B shows a schematic diagram of a system architecture of a three-dimensional reconstruction method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an embodiment of the first space
  • FIG. 3 is a schematic process diagram of an embodiment of a three-dimensional reconstruction method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the effect of a three-dimensional reconstruction method and other three-dimensional reconstruction methods according to an embodiment of the present application;
  • FIG. 5 is a schematic flowchart of an embodiment of step S12 in FIG. 1A;
  • FIG. 6 is a schematic diagram of a state of obtaining a first feature map according to an embodiment
  • step S13 in FIG. 1A is a schematic flowchart of an embodiment of step S13 in FIG. 1A;
  • FIG. 8 is a state schematic diagram of an embodiment of acquiring the hidden layer state of this time.
  • FIG. 9 is a schematic process diagram of another embodiment of a three-dimensional reconstruction method according to an embodiment of the present application.
  • FIG. 10 is a schematic frame diagram of an embodiment of a three-dimensional reconstruction apparatus according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a framework of an embodiment of an electronic device according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a framework of an embodiment of a computer-readable storage medium according to an embodiment of the present application.
  • system and “network” are often used interchangeably herein.
  • the term “and/or” in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases.
  • the character "/” in this document generally indicates that the related objects are an “or” relationship.
  • “multiple” herein means two or more than two.
  • FIG. 1A is a schematic flowchart of an embodiment of a three-dimensional reconstruction method according to an embodiment of the present application. Can include the following steps:
  • Step S11 Acquire at least two frames of first key images for this reconstruction, and determine a first space surrounding the viewing cone of the at least two frames of the first key images.
  • the first key image is obtained by photographing the target to be reconstructed.
  • the target to be reconstructed can be set according to the actual application.
  • the object to be reconstructed may be an object.
  • the object to be reconstructed may include but not limited to: a table, a chair, a sofa, etc., which are not limited here;
  • the target to be reconstructed can be the scene.
  • the scene can contain several objects. Taking the target to be reconstructed as the living room as an example, the living room can include but is not limited to the following objects: Tables, chairs, sofas, etc. Take the building to be reconstructed as an example. Buildings can include but are not limited to the following objects: stairs, corridors, gates, etc. Other situations can be deduced by analogy, and will not be listed one by one here.
  • the first key image may be acquired during the process of photographing the target to be reconstructed. At least two frames of first key images for this reconstruction may be acquired while photographing the target to be reconstructed, so as to perform incremental processing on the three-dimensional reconstruction process.
  • the first key image may correspond to a camera pose parameter, and the camera pose parameter may include, for example, a translation distance and a rotation angle.
  • the first key image satisfies at least one of the following: adjacent to the first key image
  • the difference in translation distances between the key images is greater than a preset distance threshold, and the difference in rotation angles between adjacent first key images is greater than a preset angle threshold.
  • the camera pose parameters can be obtained based on methods such as SLAM (Simultaneous Localization And Mapping), which is not limited here.
  • SLAM Simultaneous Localization And Mapping
  • SLAM usually includes the following parts, feature extraction, data association, state estimation, state update and feature update, etc. The details are not repeated here.
  • the image sequence captured by the target to be reconstructed can be denoted as ⁇ I t ⁇
  • the camera pose parameter corresponding to the image sequence can be denoted as ⁇ t ⁇
  • the camera pose parameter ⁇ In terms of t , the translation distance t and the rotation angle R can be included.
  • the first key images selected in the above image sequence must be neither too close nor too far away from each other in the three-dimensional space.
  • each reconstruction process can be based on fewer first key images as much as possible, and at the same time, the visual range of the first space can be expanded as much as possible.
  • the number of images of at least two frames of the first key images obtained by each 3D reconstruction may be less than a preset number threshold, which may be determined according to practical applications.
  • the preset number threshold can be set to be slightly larger, for example, it can be set to 5, 10, 15, etc.; In the case that the electronic device for 3D reconstruction has relatively poor computing resources, the preset number threshold can also be set to be slightly smaller, for example, can be set to 2, 3, 4, etc., which is not limited here.
  • the view frustum can be understood as a solid shape with a shape of a quadrangular pyramid, and the solid shape is the shape of the area that the camera can see when rendering. It is understandable that any point in the image captured by the camera eventually corresponds to a line in the real world, and only one point on this line is realised, and all objects on this line behind the displayed point will be occluded , and the outer boundary of the image is defined by the divergence lines corresponding to the four vertices, and these four lines are finally compared to the position of the camera.
  • FIG. 1B is a schematic diagram of a system architecture to which a three-dimensional reconstruction method according to an embodiment of the present application can be applied; as shown in FIG. 1B , the system architecture includes: an image acquisition device 2001 , a network 2002 , and an image acquisition terminal 2003 .
  • the image capture device 2001 and the image capture terminal 2003 can establish a communication connection through the network 2002, the image capture device 2001 transmits the captured image to the image capture terminal 2003 through the network 2002, the image capture terminal 2003 receives the image, and The image is processed to obtain the reconstruction result.
  • the current scene image capture device 2001 may include a camera or other device with an image capture function.
  • the image acquisition terminal 2003 may include a computer device with certain computing capability and image processing capability, for example, the computer device includes a terminal device or a server or other processing device.
  • the network 2002 can be wired or wireless. Wherein, when the image acquisition terminal 2003 is a server, the image acquisition device can communicate with the image acquisition terminal through a wired connection, such as data communication through a bus; when the image acquisition terminal 2003 is a terminal device, the image acquisition device can be wirelessly connected.
  • the connection method is connected with the image acquisition terminal to communicate with the image acquisition terminal, and then conduct data communication.
  • the image acquisition terminal 2003 may be a vision processing device with a video acquisition module, or a host with a camera.
  • the information processing method of the embodiment of the present application may be executed by the image acquisition terminal 2003 , and the above-mentioned system architecture may not include the network 2002 and the image acquisition device 2001 .
  • FIG. 2 is a schematic diagram of an embodiment of the first space.
  • the first key image is captured by camera 1, camera 2, and camera 3 indicated by black dots, respectively.
  • the maximum depth of the above-mentioned viewing cone may be pre-defined as D max , that is, the height of the quadrangular pyramid is the above-mentioned maximum depth D max .
  • D max the maximum depth of the above-mentioned viewing cone
  • the view cone shown in FIG. 2 is a schematic diagram of the view cone in the case of looking down on the first space, that is, the first space shown in FIG.
  • the first space may include, for example, a cuboid, a cube, and other hexahedrons whose adjacent surfaces are perpendicular to each other.
  • the first space can be deduced with reference to the above description, and so on, which will not be exemplified here.
  • the first space may include several voxels.
  • the voxel may also be a cuboid or a cube, and a plurality of voxels are stacked to form the first space.
  • the size of the voxel can be set according to the actual application. For example, in the case of high requirements on the accuracy of 3D reconstruction, the size of voxels can be set to be slightly smaller, or, in the case of relatively loose requirements on the accuracy of 3D reconstruction, the size of voxels can be set to be slightly larger Some are not limited here.
  • Step S12 Obtain a first feature map of the first space based on the image information in the at least two frames of the first key image.
  • the first feature map includes first feature information of voxels in the first space.
  • feature extraction may be performed on the first key image of each frame to obtain a second feature map of the first key image, and on this basis, each voxel in the first space may correspond to the second feature map.
  • the second feature information of obtains the first feature map of the first space.
  • the second feature maps of the first key images of each frame can be fused to obtain the first feature map of the first space, which is beneficial to improve the accuracy of the first feature map and further improve the accuracy of the three-dimensional reconstruction.
  • a 3D reconstruction model in order to improve the efficiency of feature extraction, can be pre-trained, and the 3D reconstruction model includes a feature extraction network, so that feature extraction can be performed on the first key image of each frame based on the feature extraction network.
  • the feature extraction network may include, but is not limited to, Convolutional Neural Networks (Convolutional Neural Networks, CNN), etc., which are not limited here.
  • the second feature map of the first key image may be a feature map with a preset resolution
  • the preset resolution may be set according to the actual application.
  • the preset resolution can be set to be slightly larger, and in the case of relatively loose requirements on the accuracy of 3D reconstruction, the preset resolution can be set to be slightly smaller, which is not limited here.
  • the second feature information corresponding to the voxel in the second feature map can be fused to obtain the first feature information of the voxel, and finally obtained Based on the first feature information of all voxels in the first space, a first feature map of the first space can be obtained.
  • the preset feature information may be used as the first feature of the voxel information.
  • the preset feature information may be set according to the actual application. For example, in order to further reduce the computational complexity of the three-dimensional reconstruction, the preset feature information may be set to 0, which is not limited herein.
  • the second feature map of each frame of the first key image may include a preset number of second feature maps corresponding to different resolutions, and the first space includes a preset number of first feature maps corresponding to different resolutions Space, the higher the resolution, the smaller the size of the voxels in the first space.
  • the first feature map may also include a preset number of first feature maps corresponding to different resolutions, and each first feature map is based on the same resolution.
  • the second feature information of the second feature map is obtained.
  • the preset number may be set according to the actual application situation, for example, two different resolutions, three different resolutions, four different resolutions, etc. may be set, which is not limited herein.
  • different resolutions can also be set according to the actual application.
  • two resolutions of 640*480 and 480*360 can be set, and two resolutions of 1280*960 and 640*480 can also be set; or, 640 can be set There are three resolutions of *480, 480*360 and 360*240, and you can also set three resolutions of 1280*960, 640*480 and 480*360, which are not limited here.
  • a 3D reconstruction model in order to improve the efficiency of 3D reconstruction, can be pre-trained, and the 3D reconstruction model can include a feature extraction network, and then based on the feature extraction network, several first Feature extraction is performed on key images to obtain second feature maps with different resolutions.
  • the feature extraction network may include but is not limited to FPN (Feature Pyramid Networks, feature pyramid network), etc., which is not limited here.
  • the first space when the second feature map of the first key image includes N second feature maps corresponding to N different resolutions, the first space also includes N corresponding to N different resolutions respectively. a first space, and the higher the resolution, the smaller the size of the voxels in the first space.
  • the second feature map of the first key image includes second feature maps with two resolutions of 1280*960 and 640*480
  • the first space also includes the first space corresponding to the resolution of 1280*960 and the The first space corresponding to the resolution 640*480, and the size of the voxel in the first space corresponding to the resolution 1280*960 is smaller than the size of the voxel in the first space corresponding to the resolution 640*480.
  • the first feature information of the voxels in the second feature map of the ith resolution in at least two frames of the first key image may be based on the corresponding ith resolution.
  • the second feature information is obtained, and the detailed process can refer to the following disclosed embodiments, which will not be described here.
  • the width of the voxel in the first space corresponding to the ith resolution can be calculated by the following formula:
  • w i represents the width of the voxel in the first space corresponding to the ith resolution
  • s represents the preset reference voxel width, which can be adjusted according to the actual application.
  • i is the ith resolution after different resolutions are sorted in ascending order.
  • Step S13 Based on the first feature map, obtain the first reconstruction result of this reconstruction.
  • prediction may be performed based on the first feature map to obtain the first reconstructed value of each voxel in the first space and the probability value of the first reconstructed value within a preset value range, and the first reconstructed value is used for Represents the distance between the voxel and the surface of the associated object in the target to be reconstructed.
  • the above prediction results can be sparsified, and the voxels whose probability satisfies the preset conditions in the first space can be selected, and Based on the first reconstruction value of the selected voxel, the first reconstruction result of this charging is obtained.
  • the above method can filter out the interference of voxels whose probability value does not meet the preset condition to the three-dimensional reconstruction, which can help to further improve the accuracy of the three-dimensional reconstruction.
  • a 3D reconstruction model in order to improve the efficiency of 3D reconstruction, can be pre-trained, and the 3D reconstruction model can include a prediction network, so that the first feature map can be input into the prediction network to obtain each voxel in the first space.
  • the first reconstructed value of , and the probability value that the first reconstructed value is within the preset value range.
  • the prediction network may include, but is not limited to, MLP (Multi-Layer Perceptron), etc., which is not limited here.
  • the first reconstructed value may be represented by a TSDF (Truncated Signed Distance Function, truncated signed distance function).
  • the preset value range may be between -1 and 1.
  • the first reconstruction value of the jth voxel can be expressed as It should be noted that in When it is greater than 0 and less than 1, it means that the jth voxel is located within the cutoff distance ⁇ before the surface of the associated object, while in the When it is less than 0 and greater than -1, it means that the jth voxel is located within the cutoff distance ⁇ behind the surface of the associated object.
  • the probability value that the first reconstructed value is within the preset value range can be regarded as the possibility that the first reconstructed value is within the preset value range, and the higher the probability value, the more the first reconstructed value is within the preset value range.
  • the preset condition may be set to include that the probability value is greater than the preset probability threshold.
  • the preset probability threshold can be set according to the actual application. For example, in the case of high requirements on the accuracy of 3D reconstruction, the preset probability threshold can be set slightly larger, such as 0.9, 0.95, etc., or, in the case of relatively loose requirements on the accuracy of 3D reconstruction Below, the preset probability threshold can be set to be slightly smaller, for example, can be set to 0.8, 0.85, etc., which is not limited here.
  • the selected voxels and their first reconstruction values may be taken as a first reconstruction result of this reconstruction.
  • the associated object surface may be the surface of the object with the closest distance to the voxel in the target to be reconstructed.
  • the surface of the associated object can be the floor, and for the voxel closest to the sofa in the living room, the surface of the associated object can be the sofa, and in other cases It can be deduced in the same way, and no examples are given here.
  • the above manner can be beneficial to further improve the accuracy of the three-dimensional reconstruction.
  • the second feature map of each frame of the first key image may include a preset number of second feature maps corresponding to different resolutions.
  • one resolution is selected as the current resolution in turn, and on this basis, the first reconstruction result corresponding to the previously selected resolution is upsampled (Upsample), and the upsampled first reconstruction result is upsampled.
  • the result is fused with the first feature map corresponding to the current resolution to obtain a fused feature map corresponding to the current resolution. Based on the fused feature map, the first reconstruction result corresponding to the current resolution is obtained, and then the current resolution is not the highest.
  • the three-dimensional reconstruction can be gradually performed from the first feature map based on "low resolution” to the first feature map based on "high resolution”, so as to facilitate the realization of "coarse to fine" three-dimensional reconstruction, and further It is beneficial to further improve the fineness of 3D reconstruction.
  • an upsampling manner such as nearest neighbor interpolation may be used to upsample the first reconstruction result.
  • the voxel width is calculated by the above formula (1), that is, in the first
  • the width of the voxel after upsampling is half the original width , so that the width of the voxel in the up-sampled first reconstruction result is the same as the width of the voxel in the first space corresponding to the current resolution.
  • the first reconstruction value of the jth voxel in the up-sampled first reconstruction result and the jth voxel in the first space corresponding to the current resolution may be compared.
  • a feature information is concatenated (Concatenate), so as to realize the fusion of the up-sampled first reconstruction result and the first feature map corresponding to the current resolution.
  • the first feature information of each voxel in the first space corresponding to the current resolution can be represented as a matrix of dimension d
  • the first reconstructed value of each voxel in the first reconstruction result after upsampling can be regarded as dimension 1 Therefore, the fusion feature map obtained by splicing the two can be regarded as a matrix of dimension d+1, and then each voxel in the fusion feature map can be represented as a matrix of dimension d+1.
  • FIG. 3 is a schematic process diagram of an embodiment of a three-dimensional reconstruction method according to an embodiment of the present application.
  • a feature extraction network such as the aforementioned FPN
  • 3 Second feature maps of different resolutions can be recorded as resolution 1, resolution 2 and resolution 3 after sorting from low to high, and the first space corresponding to resolution 1 can be recorded as is the first space 1, the first space corresponding to the resolution 2 can be denoted as the first space 2, and the first space corresponding to the resolution 3 can be denoted as the first space 3.
  • the second feature information corresponding to each voxel of the first space corresponding to the rate in the second feature map of the resolution is obtained, and the first feature map of the first space corresponding to the resolution is obtained.
  • the first feature map of the first space 1 in this reconstruction ie, the t-th time step
  • F t 1 the first feature map of the first space 1 in this reconstruction
  • F t 2 the first feature map of the first space 2
  • F t 3 The first feature map is denoted as F t 3 .
  • the first feature map F t 1 corresponding to the current resolution can be predicted directly based on a prediction network such as MLP, and the voxels in the first space 1 can be obtained.
  • the first reconstructed value and the probability value of the first reconstructed value within the preset value range can be recorded as Right again
  • the first reconstruction result is obtained by performing sparse processing (ie, S in FIG. 3 ). Since the current resolution is not the highest resolution, the resolution 2 can then be used as the current resolution, and the first reconstruction result corresponding to the last selected resolution 1 is up-sampled (ie, U in Fig.
  • the sampled first reconstruction result is spliced with the first feature map F t 2 corresponding to the current resolution (i.e., C in Figure 3) to obtain a fused feature map corresponding to resolution 2, so that prediction networks such as MLP are based on prediction networks.
  • the feature map is fused for prediction, and the first reconstructed value of each voxel in the first space 2 and the probability value of the first reconstructed value within the preset value range are obtained. For the convenience of description, it can be recorded as Right again
  • the first reconstruction result is obtained by performing sparse processing (ie, S in FIG. 3 ).
  • the resolution 3 can then be used as the current resolution, and the first reconstruction result corresponding to the last selected resolution 2 is up-sampled (ie, U in Fig. 3 ), and based on The up-sampled first reconstruction result is spliced with the first feature map F t 3 corresponding to the current resolution (ie, C in Figure 3) to obtain a fusion feature map corresponding to the resolution 3, which is based on prediction networks such as MLP.
  • the first reconstruction result is obtained by performing sparse processing (ie, S in FIG. 3 ). Since the current resolution is the highest resolution, the first reconstruction result corresponding to the current resolution may be used as the final first reconstruction result of this reconstruction. For the convenience of description, the final first reconstruction result of this reconstruction may be recorded as Other situations can be deduced by analogy, and no examples are given here.
  • Step S14 Based on the first reconstruction result of the current reconstruction, update the second reconstruction result obtained from the previous reconstruction.
  • the first reconstruction result includes, for example, the first reconstruction value of the voxel in the first space.
  • the second reconstruction result includes the second reconstruction value of the voxel in the second space.
  • the bispace is the total space surrounding the viewing cone of the previously reconstructed second key image, and the first and second reconstruction values are used to represent the distance between the voxel and the associated object surface in the target to be reconstructed. For example, reference may be made to the foregoing related description about the first reconstruction value, which will not be repeated here.
  • the second reconstructed value of the corresponding voxel in the second space may be updated based on the first reconstructed value of the voxel in the first space.
  • the above method can help to update the second reconstruction result obtained by the previous reconstruction based on the first reconstruction value of the voxel in the first space in the current reconstruction process in the three-dimensional reconstruction process, and further help to continuously improve the second reconstruction result in the reconstruction process.
  • the step of updating the second reconstruction result obtained from the previous reconstruction based on the first reconstruction result of the current reconstruction may not be performed .
  • the second reconstructed value of the voxel in the part corresponding to the first space in the second space may be replaced with the first reconstructed value of the voxel in the reconstructed first space this time.
  • the final first reconstruction result is denoted as The second reconstruction result obtained from the previous reconstruction can be recorded as
  • step S11 and subsequent steps may be re-executed to continuously improve the second reconstruction result through multiple reconstructions.
  • the updated second reconstruction result can be As the final reconstruction result of the target to be reconstructed.
  • FIG. 4 is a schematic diagram of the effects of the three-dimensional reconstruction method according to the embodiment of the present application and other three-dimensional reconstruction methods.
  • 41 and 42 in FIG. 4 represent reconstruction results obtained by other reconstruction methods
  • 43 and 44 in FIG. 4 represent reconstruction results obtained by the three-dimensional reconstruction method according to the embodiment of the present application.
  • the reconstruction results obtained by other three-dimensional reconstruction methods show obvious dispersion and delamination in the wall part circled by the rectangular frame
  • 43 and 44 in FIG. 4 the embodiment of the present application
  • the reconstructed secondary results obtained by the 3D reconstruction method do not show obvious dispersion or layering in the wall part circled by the rectangular frame, and have better smoothness.
  • the entire first space of the viewing cone of a key image can be reconstructed in 3D as a whole, which can not only greatly reduce the computational load, but also reduce the probability of layering or dispersion of the reconstruction results, thereby improving the real-time performance of the 3D reconstruction process and the accuracy of the 3D reconstruction results. smoothness.
  • FIG. 5 is a schematic flowchart of an embodiment of step S12 in FIG. 1A .
  • feature extraction can be performed on each frame of the first key image respectively to obtain the second feature map of the first key image, so that the corresponding voxels in the first space in the second feature map can be obtained.
  • the second feature information is to obtain a first feature map of the first space.
  • the embodiment of the present disclosure is a schematic flowchart of obtaining the first feature map based on the second feature information corresponding to each voxel in the first space in the second feature map. Can include the following steps:
  • Step S51 Extract the second feature information corresponding to the voxels from the second feature map of the first key image of each frame respectively.
  • the second feature information corresponding to the voxel may be extracted from the second feature map of each frame of the first key image, respectively.
  • each pixel point in the second feature map may be back-projected based on the camera pose parameters of the first key image and the camera internal parameters to determine the volume corresponding to the pixel point in the second feature map in the first space white.
  • the second feature information of the pixel corresponding to the voxel can be extracted from the second feature map of each frame of the first key image.
  • FIG. 6 is a state schematic diagram of an embodiment of acquiring the first feature map.
  • FIG. 6 for the convenience of description, similar to FIG. 2 , FIG. 6 also describes the detailed process of acquiring the first feature map from a "two-dimensional perspective". As shown in FIG. 6 , by back-projecting the pixel points in the second feature map, the voxels corresponding to each pixel point in the first space can be determined. It should be noted that the squares of different colors in FIG. 6 represent corresponding to different second feature information.
  • Step S52 Fusing the second feature information of the voxels corresponding to the at least two frames of the first key image respectively to obtain the first feature information of the voxels.
  • the average value of the second feature information of the voxels corresponding to at least two frames of the first key image respectively may be used as the first feature information of the voxels.
  • the k-th voxel in the first space corresponds to the pixel in the i-th row and the j-th column in the second feature map of the first first key image
  • the second feature in the second first key image The figure corresponds to the pixel at the mth row and the nth column.
  • the second feature information of the pixel at the ith row and the jth column in the second feature map of the first first key image can be combined with
  • the average value of the second feature information of the pixel in the mth row and the nth column in the second feature map of the two first key images is used as the first feature information of the kth voxel in the first space.
  • the weighted results of the second feature information of the voxels corresponding to the at least two frames of the first key image respectively may be used as the first feature information of the voxels.
  • the above weighted results may include, but are not limited to, weighted summation, weighted average, etc., which are not limited herein.
  • the preset feature information is used as The first feature information of the voxel.
  • Step S53 Obtain a first feature map of the first space based on the first feature information of each voxel in the first space.
  • the entire first feature information of each voxel in the first space can be used as the first feature map.
  • the second feature information corresponding to the voxels is extracted from the second feature map of each frame of the first key image, and the voxels are respectively corresponding to the second feature information of at least two frames of the first key image. Fusion is performed to obtain the first feature information of the voxels, so that the first feature map of the first space is obtained based on the first feature information of each voxel in the first space. Therefore, for each voxel in the first space, the The second feature information corresponding to the first key image of each frame is fused, which can help to further improve the accuracy of the first feature map in the first space.
  • FIG. 7 is a schematic flowchart of an embodiment of step S13 in FIG. 1A .
  • the first reconstruction result is obtained by using a three-dimensional reconstruction model.
  • Step S71 Obtain the first historical hidden layer state obtained by the previous reconstruction of the fusion network of the three-dimensional reconstruction model.
  • the first historical hidden layer state includes state values corresponding to voxels in the second space
  • the second space is the total space surrounding the view cone of the previously reconstructed second key image.
  • this reconstruction is the first reconstruction
  • the second space is the first space reconstructed this time
  • the second space included in the first historical hidden layer state can be
  • the state value corresponding to the voxel is set to the preset state value (eg, the preset state value is set to 0).
  • Step S72 From the first historical hidden layer state, extract the state value corresponding to the voxel in the first space as the second historical hidden layer state.
  • FIG. 8 is a schematic state diagram of an embodiment of acquiring the current hidden layer state. It should be noted that, for the convenience of description, similar to the aforementioned FIG. 2 and FIG. 6 , FIG. 8 is a schematic diagram of the state of acquiring the hidden layer state described in the “two-dimensional perspective”. As shown in Figure 8, for the convenience of description, the first historical hidden layer state can be recorded as First History Hidden Layer State The squares with different grayscales represent the state value of the voxel, and the squares without color represent the corresponding voxel without the state value.
  • the rectangular box in represents the first space, from the first historical hidden layer state
  • the state value corresponding to the voxel in the first space is extracted from , and the state of the second historical hidden layer can be obtained.
  • Other situations can be deduced by analogy, and no examples are given here.
  • Step S73 Execute based on the fusion network: update the state value in the second historical hidden layer state based on the first feature map to obtain the current hidden layer state.
  • the first feature map and the second historical hidden layer state can be input into the fusion network, so as to output the current hidden layer state.
  • the fusion network can be set to include but not limited to GRU (Gated Recurrent Unit, gated recurrent unit), which is not limited here.
  • the geometric information can be further extracted from the first feature map F tl to obtain a geometric feature map
  • the geometric feature map includes geometric information of voxels, so that the state value in the second historical hidden layer state can be updated based on the geometric feature map to obtain the current hidden layer state.
  • the state of the second historical hidden layer of the reconstructed first space can be updated on the basis of the extracted geometric information of the voxels, which is beneficial to improve the accuracy of the three-dimensional reconstruction.
  • geometric information can be extracted from the first feature map F tl through a network such as three-dimensional sparse convolution, pointnet , etc., to obtain a geometric feature map It can be set according to actual application needs, which is not limited here.
  • the update gate of the GRU GRU can be recorded as z t and the reset gate as r t , which can be expressed as:
  • sparseconv represents the sparse convolution
  • W z represent the network weights of the sparse convolution
  • represents the activation function (eg, sigmoid).
  • the update gate z t and reset gate r t can be determined from the geometric feature map how much information is introduced in the fusion, and the hidden layer state from the second history How much information is introduced in the fusion. It can be expressed as:
  • sparseconv represents the sparse convolution
  • W h represents the network weight of the sparse convolution
  • tanh represents the activation function
  • Step S74 Predict the current hidden layer state by using a three-dimensional reconstruction model to obtain a first reconstruction result.
  • the three-dimensional model may further include a prediction network (eg, MLP), and on this basis, the current hidden layer state may be determined based on the prediction network. Prediction is performed to obtain the first reconstruction result.
  • MLP prediction network
  • the current hidden layer state is Prediction can obtain the first reconstruction value of each voxel in the first space and the probability value of the first reconstruction value within the preset value range, and the first reconstruction value is used to represent the voxel and the surface of the associated object in the target to be reconstructed
  • a voxel whose probability value satisfies a preset condition in the first space can be selected, so that the first reconstruction result of this reconstruction can be obtained based on the first reconstruction value of the selected voxel.
  • the first historical hidden layer state can be The state value of the voxel in the first space is directly replaced with this hidden layer state The state value of the corresponding voxel in .
  • FIG. 9 is a schematic process diagram of another embodiment of the three-dimensional reconstruction method according to the embodiment of the present application. Different from the 3D reconstruction process shown in FIG. 3 , as described in the embodiments of the present disclosure, the 3D reconstruction process shown in FIG. 9 introduces the first historical hidden layer state (ie, the global hidden state in FIG.
  • each prediction of the first feature map F t i corresponding to the current resolution based on a prediction network such as MLP may include the following steps: obtaining the The first historical hidden layer state corresponding to the resolution, and from the first historical hidden layer state corresponding to the current resolution, the state value corresponding to the voxel in the first space is extracted as the second historical hidden layer state, and based on such as The fusion network execution of GRU: based on the first feature map F t i corresponding to the current resolution, the state value in the second historical hidden layer state is updated to obtain the current hidden layer state corresponding to the current resolution.
  • the current hidden layer state corresponding to the current resolution is predicted, and the first reconstruction result corresponding to the current resolution is obtained. Only the differences between the embodiments of the present disclosure and the foregoing disclosed embodiments are described, and for other processes, reference may be made to the relevant descriptions in the foregoing disclosed embodiments, which will not be repeated here.
  • the first reconstruction result is obtained by using a three-dimensional reconstruction model, and obtaining the first historical hidden layer state obtained by the previous reconstruction of the fusion network of the three-dimensional reconstruction model, and the first historical hidden layer state is obtained.
  • the second space is the total space surrounding the view cone of the previously reconstructed second key image, and on this basis, the volume of the first space is extracted from the state of the first historical hidden layer.
  • the state value corresponding to the pixel is used as the second historical hidden layer state, so as to execute based on the fusion network: update the state value in the second historical hidden layer state based on the first feature map to obtain the current hidden layer state, and then use the three-dimensional
  • the reconstruction model predicts the current hidden layer state and obtains the first reconstruction result. Therefore, in each reconstruction process, the first historical hidden layer state obtained by the previous reconstruction can be referred to, which can help to improve the consistency between this reconstruction and the previous reconstruction. Therefore, it can help to reduce the probability of layering or dispersion between the current reconstruction result and the previous reconstruction result, and further improve the smoothness of the three-dimensional reconstruction result.
  • the three-dimensional reconstruction result in any of the above-mentioned three-dimensional reconstruction method embodiments may be reconstructed from a three-dimensional reconstruction model.
  • Several groups of sample images captured on the sample target may be collected in advance, each group of sample images includes at least two frames of sample key images, and the viewing cones of at least two frames of sample key images included in each group of sample images are surrounded by the first sample space,
  • the first sample space includes several voxels, and reference may be made to the relevant descriptions in the foregoing disclosed embodiments, and details are not described herein again.
  • each group of sample images is marked with the first actual reconstructed value of each voxel in the first sample space and the actual probability value of the first actual reconstructed value within a preset value range, and the first The actual reconstruction value is used to represent the distance between the voxel and the surface of the associated object in the sample target.
  • the first actual reconstruction value may be represented by TSDF, and the associated object surface may refer to the relevant description in the aforementioned disclosed embodiments, which will not be repeated here.
  • the actual probability value corresponding to the first actual reconstruction value may be marked as 1, and in the case where the first actual reconstruction value is not within the preset value range , the actual probability value corresponding to the first actual reconstruction value may be marked as 0.
  • the feature extraction network eg, FPN
  • a sample feature map includes the first sample feature information of the voxels in the first sample space, so that the first sample feature map can be input into the prediction network of the three-dimensional reconstruction model to obtain the first sample reconstruction result, and the first sample reconstruction result is obtained.
  • the sample reconstruction result includes the first sample reconstruction value of each voxel in the first sample space and the sample probability value of the first sample reconstruction value within the preset value range, and can further be based on each voxel in the first sample space.
  • the network parameters of the three-dimensional reconstruction model are adjusted according to the difference between the first sample reconstruction value of the voxel and the first actual reconstruction value, and the difference between the sample probability value and the actual probability value of each voxel in the first sample space.
  • the first loss value between the sample probability value and the actual probability value may be calculated based on the binary cross-entropy loss (binary cross-entropy, BCE) function, and the first sample reconstruction value may be calculated based on the L1 loss function
  • BCE binary cross-entropy
  • the historical hidden layer state of the first sample obtained by the previous reconstruction by the fusion network of the three-dimensional reconstruction model may be obtained, and
  • the historical hidden layer state of the first sample includes the sample state value corresponding to the voxel in the second sample space, and the second sample space is the total space surrounding the view cones of several groups of sample images reconstructed before.
  • sample state value corresponding to the voxel of the first sample space is extracted as the second sample historical hidden layer state, so that it can be executed based on the fusion network: based on the first sample feature map
  • the sample state value in the two-sample historical hidden layer state is updated to obtain the current sample hidden layer state, and then the current sample hidden layer state can be predicted based on the prediction network to obtain the first sample reconstruction result.
  • FIG. 10 is a schematic frame diagram of an embodiment of a three-dimensional reconstruction apparatus 100 according to an embodiment of the present application.
  • the three-dimensional reconstruction apparatus 100 includes a key image acquisition module 101, a first space determination module 102, a first feature acquisition module 103, a reconstruction result acquisition module 104, and a reconstruction result update module 105, and the key image acquisition module 101 is configured to acquire for this reconstruction.
  • the first space determination module 102 is configured to determine the first space surrounding the viewing cone of the at least two frames of the first key image; wherein, the first key image is obtained by photographing the target to be reconstructed; the first The feature acquisition module 103 is configured to obtain a first feature map of the first space based on image information in at least two frames of the first key image, wherein the first feature map includes the first feature information of voxels in the first space; the reconstruction result The obtaining module 104 is configured to obtain the first reconstruction result of this reconstruction based on the first feature map; the reconstruction result updating module 105 is configured to update the second reconstruction result obtained by the previous reconstruction based on the first reconstruction result of this reconstruction.
  • the three-dimensional reconstruction apparatus 100 further includes a second feature acquisition module configured to perform feature extraction on each frame of the first key image, respectively, to obtain a second feature map of the first key image.
  • the first feature acquisition module 103 The configuration is to obtain a first feature map of the first space based on the second feature information corresponding to each voxel of the first space in the second feature map.
  • the first feature acquisition module 103 includes a feature information extraction sub-module, configured to extract the second feature information corresponding to the voxels from the second feature map of the first key image of each frame, respectively.
  • the first feature The acquisition module 103 includes a feature information fusion sub-module, which is configured to fuse the second feature information of the voxels corresponding to at least two frames of the first key image respectively to obtain the first feature information of the voxels.
  • the first feature acquisition module 103 includes a first feature information.
  • the feature acquisition sub-module is configured to obtain a first feature map of the first space based on the first feature information of each voxel in the first space.
  • the feature information fusion sub-module is configured to use the average value of the second feature information of the voxel corresponding to the first key image of each frame as the first feature information of the voxel.
  • the first feature acquisition module 103 further includes a feature information setting sub-module, which is configured for the case where the second feature information corresponding to the voxel is not extracted in the second feature map of each frame of the first key image Next, the preset feature information is used as the first feature information of the voxel.
  • the second feature map of each frame of the first key image includes a preset number of second feature maps corresponding to different resolutions;
  • the first space includes a preset number of first spaces corresponding to different resolutions , the higher the resolution, the smaller the size of the voxels in the first space;
  • the first feature map includes a preset number of first feature maps corresponding to different resolutions, and each first feature map is a second feature map based on the same resolution The second feature information of the feature map is obtained.
  • the reconstruction result acquisition module 104 includes a resolution selection sub-module configured to select one resolution as the current resolution in order of resolution from low to high, and the reconstruction result acquisition module 104 includes a feature map
  • the update sub-module is configured to upsample the first reconstruction result corresponding to the last selected resolution, and fuse the upsampled first reconstruction result with the first feature map corresponding to the current resolution to obtain a resolution corresponding to the current resolution.
  • the reconstruction result acquisition module 104 includes a reconstruction result acquisition sub-module, and is configured to obtain a first reconstruction result corresponding to the current resolution based on the fusion feature map.
  • the reconstruction result acquisition module 104 includes a loop execution sub-module, configured In order to re-execute the resolution selection sub-module, the feature map update sub-module and the reconstruction result acquisition sub-module in combination with the foregoing resolution selection sub-module, the feature map update sub-module and the reconstruction result acquisition sub-module in the order of resolution from low to high, select a resolution in turn when the current resolution is not the highest resolution
  • the reconstruction result acquisition module 104 includes a first result determination sub-module, configured to use the first reconstruction result corresponding to the current resolution as the current resolution when the current resolution is the highest resolution
  • the second reconstruction is the final first reconstruction result.
  • the reconstruction result obtaining module 104 includes a result prediction sub-module configured to perform prediction based on the first feature map to obtain the first reconstruction value of each voxel in the first space and the first reconstruction value at a preset value The probability value within the range; wherein, the first reconstruction value is configured to represent the distance between the voxel and the surface of the associated object in the target to be reconstructed, and the reconstruction result acquisition module 104 includes a voxel selection sub-module, configured to select the first space in the For the voxels whose probability value satisfies the preset condition, the reconstruction result obtaining module 104 includes a second result determination sub-module, which is configured to obtain the first reconstruction result of this reconstruction based on the first reconstruction value of the selected voxel.
  • the first reconstruction result includes a first reconstruction value of a voxel in a first space
  • the second reconstruction result includes a second reconstruction value for a voxel in a second space
  • the second space is the first reconstruction value surrounding the previous reconstruction.
  • the total space of the viewing cone of the two key images, the first reconstruction value and the second reconstruction value are configured to represent the distance between the voxel and the surface of the associated object in the target to be reconstructed
  • the reconstruction result update module 105 is configured to be based on the first space.
  • the first reconstructed value of the voxel is updated with the second reconstructed value of the corresponding voxel in the second space.
  • the associated object surface is the object surface that is closest to the voxel in the target to be reconstructed.
  • the first reconstruction result is obtained by using a 3D reconstruction model
  • the reconstruction result obtaining module 104 includes a hidden layer state obtaining sub-module configured to obtain the first history obtained by the previous reconstruction of the fusion network of the 3D reconstruction model. hidden layer state; wherein, the first historical hidden layer state includes the state value corresponding to the voxel in the second space, the second space is the total space surrounding the view cone of the second key image reconstructed before, and the reconstruction result acquisition module 104 includes the hidden layer state.
  • the layer state extraction sub-module is configured to extract the state value corresponding to the voxel in the first space from the first historical hidden layer state as the second historical hidden layer state
  • the reconstruction result acquisition module 104 includes a hidden layer state update sub-module , is configured to execute based on the fusion network: update the state value in the second historical hidden layer state based on the first feature map to obtain the current hidden layer state
  • the reconstruction result acquisition module 104 includes a reconstruction result prediction sub-module, and is configured to adopt three-dimensional The reconstruction model predicts the current hidden layer state to obtain the first reconstruction result.
  • the state value in the first historical hidden layer state is a preset state value.
  • the fusion network includes: a gated recurrent unit.
  • the three-dimensional reconstruction model further includes a prediction network, and the reconstruction result prediction sub-module is configured to predict the current hidden layer state based on the prediction network to obtain the first reconstruction result.
  • the reconstruction result obtaining module 104 includes a geometric feature extraction sub-module, configured to perform geometric information extraction on the first feature map to obtain a geometric feature map; wherein the geometric feature map includes geometric information of voxels, and the hidden layer
  • the state update sub-module is configured to update the state value in the second historical hidden layer state based on the geometric feature map to obtain the current hidden layer state.
  • the reconstruction result obtaining module 104 further includes a historical state update sub-module, configured to update the state value corresponding to the corresponding voxel in the first historical hidden layer state based on the state value in the current hidden layer state.
  • the at least two frames of first key images are acquired; the first key images correspond to camera pose parameters, and the camera pose parameters include a translation distance and a rotation angle,
  • the first key image satisfies at least one of the following: a difference in translation distance between adjacent first key images is greater than a preset distance threshold, and a difference in rotation angle between adjacent first key images is greater than a preset angle threshold.
  • FIG. 11 is a schematic frame diagram of an embodiment of an electronic device 110 according to an embodiment of the present application.
  • the electronic device 110 includes a memory 111 and a processor 112 coupled to each other, and the processor 112 is configured to execute program instructions stored in the memory 111 to implement the steps of any of the foregoing three-dimensional reconstruction method embodiments.
  • the electronic device 110 may include, but is not limited to, a microcomputer and a server.
  • the electronic device 110 may also include mobile devices such as mobile phones, notebook computers, and tablet computers, which are not limited herein.
  • the processor 112 is configured to control itself and the memory 111 to implement the steps of any of the above three-dimensional reconstruction method embodiments.
  • the processor 112 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 112 may be an integrated circuit chip with signal processing capability.
  • the processor 112 may also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor 112 may be jointly implemented by an integrated circuit chip.
  • the above solution can improve the real-time performance of the 3D reconstruction process and the smoothness of the 3D reconstruction result.
  • FIG. 12 is a schematic diagram of a framework of an embodiment of a computer-readable storage medium 120 according to an embodiment of the present application.
  • the computer-readable storage medium 120 stores program instructions 121 that can be executed by the processor, and the program instructions 121 are configured to implement the steps of any of the foregoing three-dimensional reconstruction method embodiments.
  • the above solution improves the real-time performance of the 3D reconstruction process and the smoothness of the 3D reconstruction result.
  • the functions or modules included in the apparatus provided by the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation of the above method embodiments may refer to the descriptions of the above method embodiments. Repeat.
  • the disclosed method and apparatus may be implemented in other manners.
  • the device implementations described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other divisions.
  • units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
  • Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed over network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this implementation manner.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium.
  • the medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • Embodiments of the present disclosure disclose a three-dimensional reconstruction method, apparatus, device, and storage medium, wherein the three-dimensional reconstruction method includes: acquiring at least two frames of first key images used for this reconstruction, and determining to surround the at least two frames The first space of the viewing cone of the first key image; wherein, the first key image is obtained by photographing the target to be reconstructed; based on the image information in the at least two frames of the first key image, the first space is obtained. a first feature map, wherein the first feature map includes first feature information of voxels in the first space; based on the first feature map, a first reconstruction result of this reconstruction is obtained; based on the first feature map The first reconstruction result of the second reconstruction is to update the second reconstruction result obtained by the previous reconstruction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Image Generation (AREA)
  • Image Analysis (AREA)

Abstract

一种三维重建方法、装置、设备及存储介质,其中,三维重建方法包括:获取用于本次重建的若干帧第一关键图像,并确定包围若干帧第一关键图像的视锥的第一空间(S11);其中,第一关键图像是对待重建目标拍摄得到的;基于若干帧第一关键图像中的图像信息,得到第一空间的第一特征图(S12),其中,第一特征图包括第一空间中体素的第一特征信息;基于第一特征图,得到本次重建的第一重建结果(S13);基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新(S14)。上述方法,能够提高三维重建过程的实时性以及三维重建结果的平滑度。

Description

一种三维重建方法、装置、设备及存储介质
相关申请的交叉引用
本专利申请要求2021年01月15提交的中国专利申请号为202110057035.9,申请人为浙江商汤科技开发有限公司,申请名称为“三维重建方法及相关装置、设备”的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种三维重建方法、装置、设备及存储介质。
背景技术
随着电子信息技术的发展,通过手机、平板电脑等集成有摄像头的电子设备对真实场景中的物体进行三维重建,在诸多应用场景中均得到了广泛应用。例如,可以应用于AR(Augmented Reality,增强现实)等下游应用之中,而为了增强AR效果和物理场景之间的沉浸感,三维重建结果需要尽可能地平滑,且三维重建过程需要尽可能地实时。有鉴于此,如何提高三维重建过程的实时性以及三维重建结果的平滑度成为极具研究价值的课题。
发明内容
本申请实施例提供一种三维重建方法、装置、设备及存储介质。
本申请实施例提供了一种三维重建方法,包括:获取用于本次重建的至少两帧第一关键图像,并确定包围至少两帧第一关键图像的视锥的第一空间;其中,第一关键图像是对待重建目标拍摄得到的;基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,其中,第一特征图包括第一空间中体素的第一特征信息;基于第一特征图,得到本次重建的第一重建结果;基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。
因此,通过获取用于本次重建的至少两帧第一关键图像,并确定包围至少两帧第一关键图像的视锥的第一空间,且第一关键图像是对待重建目标拍摄得到的,在此基础上基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,且第一特征图包括第一空间中体素的第一特征信息,从而基于第一特征图,得到本次重建的第一重建结果,进而基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新,故此每次重建过程中,均能够对包围至少两帧第一关键图像的视锥的第一空间整体进行三维重建,从而不仅能够大大降低计算负荷,还能够降低重建结果出现分层或分散的概率,进而能够提高三维重建过程的实时性以及三维重建结果的平滑度。
其中,在获取用于本次重建的至少两帧第一关键图像之后,方法还包括:分别对每帧第一关键图像进行特征提取,得到第一关键图像的第二特征图;基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,包括:基于第一空间的各体素在第二特征图中对应的第二特征信息,得到第一空间的第一特征图。
因此,通过分别对每帧第一关键图像进行特征提取,得到第一关键图像的第二特征图,从而基于第一空间的各体素在第二特征图中对应的第二特征信息,得到第一空间的第一特征图,故能够融合各帧第一关键图像的第二特征图,得到第一空间的第一特征图,有利于提高第一特征图的准确性,进而能够有利于提高三维重建的准确性。
其中,基于第一空间的各体素在第二特征图中对应的第二特征信息,得到第一空间的第一特征图,包括:分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征 信息;将体素分别对应至少两帧第一关键图像的第二特征信息进行融合,得到体素的第一特征信息;基于第一空间的各体素的第一特征信息,得到第一空间的第一特征图。
因此,通过分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征信息,并将体素分别对应至少两帧第一关键图像的第二特征信息进行融合,得到体素的第一特征信息,从而基于第一空间的各体素的第一特征信息,得到第一空间的第一特征图,故此对于第一空间中每一体素而言,均融合有对应每帧第一关键图像的第二特征信息,能够有利于进一步提高第一空间的第一特征图的精准性。
其中,将体素分别对应至少两帧第一关键图像的第二特征信息进行融合,得到体素的第一特征信息,包括以下至少之一:将体素对应每帧第一关键图像的第二特征信息的平均值,作为体素的第一特征信息;在分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征信息之后,方法还包括:在每帧第一关键图像的第二特征图中均未提取得到体素对应的第二特征信息的情况下,将预设特征信息作为体素的第一特征信息。
因此,通过将体素对应每帧第一关键图像的第二特征信息的平均值,作为体素的第一特征信息,能够降低获取第一特征信息的复杂度,从而能够有利于提高三维重建的速度,进而能够有利于进一步提高三维重建过程的实时性;而在每帧第一关键图像的第二特征图中均未提取得到体素对应的第二特征信息的情况下,将预设特征信息作为体素的第一特征信息,能够有利于进一步降低获取第一特征信息的复杂度。
其中,每帧第一关键图像的第二特征图均包括对应不同分辨率的预设数量张第二特征图;第一空间包括对应不同分辨率的预设数量个第一空间,分辨率越高,第一空间中体素的尺寸越小;第一特征图包括对应不同分辨率的预设数量张第一特征图,每张第一特征图是基于相同分辨率的第二特征图的第二特征信息得到。
因此,通过将每帧第一关键图像的第二特征图设置为均包括对应不同分辨率的预设数量张第二特征图,且第一空间包括对应不同分辨率的预设数量个第一空间,分辨率越高,第一空间中体素的尺寸越小,此外将第一特征图设置为包括对应不同分辨率的预设数量张第一特征图,每张第一特征图是基于相同分辨率的第二特征图的第二特征信息得到,故能够有利于通过不同分辨率的预设数量张第二特征图来进行三维重建,从而能够有利于进一步提高三维重建的精细度。
其中,基于第一特征图,得到本次重建的第一重建结果,包括:按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率;将上一次选择的分辨率对应的第一重建结果进行上采样,并将上采样后的第一重建结果与当前分辨率对应的第一特征图进行融合,得到与当前分辨率对应的融合特征图;基于融合特征图,得到与当前分辨率对应的第一重建结果;在当前分辨率并非最高分辨率的情况下,重新执行按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率的步骤以及后续步骤;在当前分辨率为最高分辨率的情况下,将与当前分辨率对应的第一重建结果作为本次重建最终的第一重建结果。
因此,通过按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率,并将上一次选择的分辨率对应的第一重建结果进行上采样,将上采样后的第一重建结果与当前分辨率对应的第一特征图进行融合,得到与当前分辨率对应的融合特征图,在此基础上基于融合特征图,得到与当前分辨率对应的第一重建结果,从而在当前分辨率并非最高分辨率的情况下,重新执行按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率的步骤以及后续步骤,或者在当前分辨率为最高分辨率的情况下,将与当前分辨率对应的第一重建结果作为本次重建最终的第一重建结果,故此能够由基于“低分辨率”的第一特征图至基于“高分辨率”的第一特征图逐渐进行三维重建,从而能够有利于实现“由粗到细”的三维重建,进而能够有利于进一步提高三维重建的精细度。
其中,基于第一特征图,得到本次重建的第一重建结果,包括:基于第一特征图进行预测,得到第一空间中各体素的第一重建值和第一重建值在预设数值范围内的概率值;其 中,第一重建值用于表示体素与待重建目标中的关联物体表面之间的距离;选择第一空间中概率值满足预设条件的体素;基于选择的体素的第一重建值,得到本次重建的第一重建结果。
因此,通过基于第一特征图进行预测,得到第一空间中各体素的第一重建值和第一重建值在预设数值范围内的概率值,且第一重建值用于表示体素与待重建目标中的关联物体表面之间的距离,并选择第一空间中概率值满足预设条件的体素,从而基于选择的体素的第一重建值,得到本次重建的第一重建结果,故能够滤除概率值不满足预设条件的体素对于三维重建的干扰,能够有利于进一步提高三维重建的准确性。
其中,第一重建结果包括第一空间中体素的第一重建值,第二重建结果包括第二空间中体素的第二重建值,第二空间是包围之前重建的第二关键图像的视锥的总空间,第一重建值和第二重建值用于表示体素与待重建目标中的关联物体表面之间的距离;基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新,包括:基于第一空间中体素的第一重建值,更新第二空间中对应体素的第二重建值。
因此,通过将第一重建结果设置为包括第一空间中体素的第一重建值,第二重建结果设置为包括第二空间中体素的第二重建值,且第二空间是包围之前重建的第二关键图像的视锥的总空间,第一重建值和第二重建值用于表示体素与待重建目标中的关联物体表面之间的距离,在此基础上基于第一空间中体素的第一重建值,更新第二空间中对应体素的第二重建值,以对之前重建得到的第二重建结果进行更新,能够有利于在三维重建过程中基于本次重建过程中第一空间中体素的第一重建值,更新之前重建得到的第二重建结果,进而能够有利于在重建过程中不断完善第二重建结果,提高三维重建的准确性。
其中,关联物体表面为待重建目标中与体素距离最近的物体表面。
因此,通过将关联物体表面设置为待重建目标中与体素距离最近的物体表面,能够有利于进一步提高三维重建的准确性。
其中,第一重建结果是采用三维重建模型得到的;基于第一特征图,得到本次重建的第一重建结果,包括:获取三维重建模型的融合网络在之前重建所得到的第一历史隐层状态;其中,第一历史隐层状态包括第二空间中体素对应的状态值,第二空间是包围之前重建的第二关键图像的视锥的总空间;从第一历史隐层状态中,提取第一空间的体素对应的状态值,以作为第二历史隐层状态;基于融合网络执行:基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态;采用三维重建模型对本次隐层状态进行预测,得到第一重建结果。
因此,通过将第一重建结果设置为是采用三维重建模型得到的,并获取三维重建模型的融合网络在之前重建所得到的第一历史隐层状态,且第一历史隐层状态包括第二空间中体素对应的状态值,第二空间是包围之前重建的第二关键图像的视锥的总空间,在此基础上从第一历史隐层状态中,提取第一空间的体素对应的状态值,以作为第二历史隐层状态,从而基于融合网络执行:基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态,进而采用三维重建模型对本次隐层状态进行预测,得到第一重建结果,故此每次重建过程中均能参考之前重建所得到的第一历史隐层状态,能够有利于提高本次重建与之前重建的一致性,从而能够有利于降低本次重建结果与之前重建结果之间发生分层或分散的概率,进而能够有利于进一步提高三维重建结果的平滑度。
其中,在本次重建为首次重建的情况下,第一历史隐层状态中的状态值为预设状态值。
因此,在本次重建为首次重建的情况下,通过将第一历史隐层状态中的状态值设置为预设状态值,能够有利于提高三维重建的鲁棒性。
其中,融合网络包括:门控循环单元;三维重建模型还包括预测网络,采用三维重建模型对本次隐层状态进行预测,得到第一重建结果,包括:基于预测网络对本次隐层状态进行预测,得到第一重建结果。
因此,通过将融合网络设置为包括门控循环单元,能够有利于通过门控循环单元引入选择性注意机制,从而能够有利于在三维重建过程中选择性地参考之前重建所得到的第一历史隐层状态,进而能够有利于提高三维重建的准确性;而通过将三维重建模型设置为包括预测网络,从而基于预测网络对本次隐层状态进行预测,得到第一重建结果,能够有利于提高三维重建的效率。
其中,在基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态之前,方法还包括:对第一特征图进行几何信息提取,得到几何特征图;其中,几何特征图包括体素的几何信息;基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态,包括:基于几何特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态。
因此,通过对第一特征图进行几何信息提取,得到几何特征图,且几何特征图包括体素的几何信息,在此基础上基于几何特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态,能够在提取得到的体素的几何信息的基础上对本次重建的第一空间的第二历史隐层状态进行更新,有利于提高三维重建的准确性。
其中,在基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态之后,方法还包括:基于本次隐层状态中的状态值,更新第一历史隐层状态中相应体素对应的状态值。
因此,通过基于本次隐层状态中的状态值,更新本次重建的第一空间的第二历史隐层状态中相应体素对应的状态值,故能够在更新得到本次隐层状态之后,进一步更新第二空间的第一历史隐层状态,有利于在本次重建的基础上进一步提高第二空间的第一历史隐层状态的准确性,从而能够有利于提高三维重建的准确性。
其中,在拍摄待重建目标过程中,获取至少两帧第一关键图像;第一关键图像对应有相机位姿参数,相机位姿参数包括平移距离和旋转角度,第一关键图像满足以下至少之一:相邻第一关键图像之间平移距离的差异大于预设距离阈值,相邻第一关键图像之间旋转角度的差异大于预设角度阈值。
因此,通过将至少两帧第一关键图像设置为是在拍摄待重建目标过程中获取的,能够实现一边拍摄一边进行三维重建;而第一关键图像对应有相机位姿参数,相机位姿参数包括平移距离和旋转角度,第一关键图像设置为满足以下至少之一:相邻第一关键图像之间平移距离的差异大于预设距离阈值,相邻第一关键图像之间旋转角度的差异大于预设角度阈值,能够有利于在每次重建过程中参考尽可能少的关键图像的基础上,尽可能地扩大第一空间的视觉范围,从而能够有利于提高三维重建的效率。
本申请实施例提供了一种三维重建装置,包括:关键图像获取模块、第一空间确定模块、第一特征获取模块、重建结果获取模块和重建结果更新模块,关键图像获取模块配置为获取用于本次重建的至少两帧第一关键图像;第一空间确定模块配置为确定包围至少两帧第一关键图像的视锥的第一空间;其中,第一关键图像是对待重建目标拍摄得到的;第一特征获取模块配置为基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,其中,第一特征图包括第一空间中体素的第一特征信息;重建结果获取模块配置为基于第一特征图,得到本次重建的第一重建结果;重建结果更新模块配置为基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。
本申请实施例提供了一种电子设备,包括相互耦接的存储器和处理器,处理器配置为执行存储器中存储的程序指令,以实现上述三维重建方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述三维重建方法。
上述方案,通过获取配置为本次重建的至少两帧第一关键图像,并确定包围至少两帧第一关键图像的视锥的第一空间,且第一关键图像是对待重建目标拍摄得到的,在此基础 上基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,且第一特征图包括第一空间中体素的第一特征信息,从而基于第一特征图,得到本次重建的第一重建结果,进而基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新,故此每次重建过程中,均能够对包围至少两帧第一关键图像的视锥的第一空间整体进行三维重建,从而不仅能够大大降低计算负荷,还能够降低重建结果出现分层或分散的概率,进而能够提高三维重建过程的实时性以及三维重建结果的平滑度。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1A是本申请实施例一种三维重建方法一实施例的流程示意图;
图1B示出了本申请实施例一种三维重建方法的一种系统架构示意图;
图2是第一空间一实施例的示意图;
图3是本申请实施例一种三维重建方法一实施例的过程示意图;
图4是本申请实施例一种三维重建方法与其他三维重建方法的效果示意图;
图5是图1A中步骤S12一实施例的流程示意图;
图6是获取第一特征图一实施例的状态示意图;
图7是图1A中步骤S13一实施例的流程示意图;
图8是获取本次隐层状态一实施例的状态示意图;
图9是本申请实施例一种三维重建方法另一实施例的过程示意图;
图10是本申请实施例一种三维重建装置一实施例的框架示意图;
图11是本申请实施例电子设备一实施例的框架示意图;
图12是本申请实施例计算机可读存储介质一实施例的框架示意图。
具体实施方式
下面结合说明书附图,对本申请实施例实施例的方案进行详细说明。
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的细节,以便透彻理解本申请实施例。
本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。
请参阅图1A,图1A是本申请实施例三维重建方法一实施例的流程示意图。可以包括如下步骤:
步骤S11:获取用于本次重建的至少两帧第一关键图像,并确定包围至少两帧第一关键图像的视锥的第一空间。
本公开实施例中,第一关键图像是对待重建目标拍摄得到的。待重建目标可以根据实际应用情况进行设置。例如,在需要对某一物体进行三维重建的情况下,待重建目标可以为物体,例如,待重建目标可以包括但不限于:桌子、椅子、沙发等等,在此不做限定;或者,在需要对某一场景进行三维重建的情况下,待重建目标可以为场景,需要说明的是,场景中可以包含若干物体,以待重建目标是客厅为例,客厅内可以包括但不限于如下物体:桌子、椅子、沙发等,以待重建目标是建筑为例,建筑可以包括但不限于如下物体:楼梯、走廊、大门等,其他情况可以以此类推,在此不再一一举例。
在一个实施场景中,为了提高三维重建的实时性,第一关键图像可以是在拍摄待重建目标过程中获取的。可以一边拍摄待重建目标,一边获取用于本次重建的至少两帧第一关键图像,以实现对三维重建过程进行增量处理。
在一个实施场景中,第一关键图像可以对应有相机位姿参数,相机位姿参数例如可以包括平移距离和旋转角度,在此基础上,第一关键图像满足以下至少之一:相邻第一关键图像之间的平移距离的差异大于预设距离阈值,相邻第一关键图像之间旋转角度的差异大于预设角度阈值。上述方式,能够有利于在每次重建过程中参考尽可能少的关键图像的基础上,尽可能地扩大第一空间的视觉范围,从而能够有利于提高三维重建的效率。
在一个实施场景中,相机位姿参数可以基于诸如SLAM(Simultaneous Localization And Mapping,即时定位与地图构建)等方式获取,在此不做限定。SLAM通常包括如下几个部分,特征提取,数据关联,状态估计,状态更新以及特征更新等,细节在此不再赘述。
在另一个实施场景中,为了便于描述,对待重建目标拍摄得到的图像序列可以记为{I t},图像序列所对应的相机位姿参数可以记为{ξ t},对于相机位姿参数ξ t而言,可以包括平移距离t和旋转角度R。为了在保持多视角重建过程中提供足够的视觉范围,在上述图像序列中所挑选的第一关键图像在三维空间中彼此之间须既不太靠近又不太远离,故在图像序列中某一帧图像的平移距离t与最新挑选的第一关键图像的平移距离t之间的差异大于预设距离阈值t max,且该帧图像的旋转角度R与上述最新挑选的第一关键图像的旋转角度R之间的差异大于预设角度阈值R max的情况下,可以将该帧图像挑选为新的第一关键图像。上述方式,能够在每次重建过程中最大可能地基于较少的第一关键图像,同时最大可能地扩大第一空间的视觉范围。
在又一个实施场景中,为了合理控制每次三维重建的计算负荷,每次三维重建所获取的至少两帧第一关键图像的图像数量可以小于预设数量阈值,预设数量阈值可以根据实际应用情况进行设置,例如,在执行三维重建的电子设备具有较为富余的计算资源的情况下,预设数量阈值可以设置地稍大一些,如可以设置为5、10、15等等;或者,在执行三维重建的电子设备具有相对贫乏的计算资源的情况下,预设数量阈值也可以设置地稍小一些,如可以设置为2、3、4等等,在此不做限定。
此外,需要说明的是,视锥可以理解为一个形状为四棱锥的实体形状,该实体形状就是相机渲染时能够看到区域的形状。可以理解,摄像头所拍摄到的图像中任何一点最终对应于现实世界中的一条线,并且只会现实这条线上的一个点,这条线上所有在这个显示的点后面的物体都会被遮挡,而图像的外边界由四个顶点对应的发散线定义,且这四条线最终相较于摄像头所在位置。
图1B可以应用本申请实施例一种三维重建方法的一种系统架构示意图;如图1B所示,该系统架构中包括:图像采集设备2001、网络2002和图像获取终端2003。为实现支撑一个示例性应用,图像采集设备2001和图像获取终端2003可以通过网络2002建立通信连接,图像采集设备2001通过网络2002向图像获取终端2003传输采集的图像,图像获取终端2003接收图像,并对图像进行处理,进而得到本次重建结果。
作为示例,当前场景图像采集设备2001可以包括摄像头等具有图像采集功能的设备。图像获取终端2003可以包括具有一定计算能力和图像处理能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备。网络2002可以采用有线连接或无线连接方式。其中,上图像获取终端2003为服务器时,图像采集设备可以通过有线连接的方式与图像获取终端通信连接,例如通过总线进行数据通信;当图像获取终端2003为终端设备时,图像采集设备可以通过无线连接的方式与图像获取终端通信连接,进而进行数据通信。
或者,在一些场景中,图像获取终端2003可以是带有视频采集模组的视觉处理设备, 可以是带有摄像头的主机。这时,本申请实施例的信息处理方法可以由图像获取终端2003执行,上述系统架构可以不包含网络2002和图像采集设备2001。
在一个实施场景中,请结合参阅图2,图2是第一空间一实施例的示意图。如图2所示,第一关键图像分别由黑点表示的相机1、相机2和相机3拍摄得到,在实际应用过程中,为了降低相对相机过远的图像信息对后续三维重建可能产生的干扰,在确定第一空间时,可以预先定义上述视锥的最大深度为D max,即四棱锥的高度为上述最大深度D max。请继续结合参阅图2,为了便于描述,图2以等腰三角形所示的视锥为俯视第一空间的情况下视锥的示意图,即图2所示的第一空间为二维视角下的示意图,其中等腰三角形中的虚线即表示上述最大深度D max,在此情况下,可以定义将相机1、相机2和相机3拍摄到的第一关键图像的视锥包围起来的空间即为第一空间。为了便于三维重建,本公开实施例以及下述公开实施例中,如无特别说明,第一空间例如可以包括长方体、正方体等相邻表面相互垂直的六面体。此外,在第一关键图像的视锥为其他情况下,或者第一关键图像为其他数量的情况下,第一空间可以参照上述描述以此类推,在此不再一一举例。
此外,本公开实施例以及下述公开实施例中,第一空间可以包括若干体素(voxel)。以第一空间为长方体或正方体为例,体素也可以为长方体或正方体,若干体素堆叠形成第一空间。此外,体素的尺寸可以根据实际应用情况进行设置。例如,在对三维重建的精度要求较高的情况下,体素的尺寸可以设置地稍小一些,或者,在对三维重建的精度要求相对宽松的情况下,体素的尺寸可以设置地稍大一些,在此不做限定。
步骤S12:基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图。
本公开实施例中,第一特征图包括第一空间中体素的第一特征信息。
在一个实施场景中,可以分别对每帧第一关键图像进行特征提取,得到第一关键图像的第二特征图,在此基础上可以基于第一空间的各个体素在第二特征图中对应的第二特征信息,得到第一空间的第一特征图。上述方式,能够融合各帧第一关键图像的第二特征图,得到第一空间的第一特征图,有利于提高第一特征图的准确性,进而能够有利于提高三维重建的准确性。
在一个实施场景中,为了提高特征提取的效率,可以预先训练一个三维重建模型,且该三维重建模型包括特征提取网络,从而可以基于特征提取网络分别对每帧第一关键图像进行特征提取,得到第一关键图像的第二特征图。特征提取网络可以包括但不限于卷积神经网络(Convolutional Neural Networks,CNN)等等,在此不做限定。三维重建模型的训练过程可以参阅下述相关公开实施例,在此暂不赘述。
在另一个实施场景中,第一关键图像的第二特征图可以为预设分辨率的特征图,预设分辨率可以根据实际应用情况进行设置,例如,在对三维重建的精度要求较高的情况下,预设分辨率可以设置地稍大一些,而在对三维重建的精度要求相对宽松的情况下,预设分辨率可以设置地稍小一些,在此不做限定。
在又一个实施场景中,对于第一空间的每一体素而言,可以融合该体素在第二特征图中对应的第二特征信息,从而得到该体素的第一特征信息,最终在得到第一空间所有体素的第一特征信息的基础上,可以得到第一空间的第一特征图。
在又一个实施场景中,在每帧第一关键图像的第二特征图中均未提取得到体素对应的第二特征信息的情况下,可以将预设特征信息作为该体素的第一特征信息。预设特征信息可以根据实际应用情况进行设置,例如,为了进一步降低三维重建的计算复杂度,预设特征信息可以设置为0,在此不做限定。
在另一个实施场景中,每帧第一关键图像的第二特征图可以包括对应不同分辨率的预设数量张第二特征图,且第一空间包括对应不同分辨率的预设数量个第一空间,分辨率越高,第一空间中体素的尺寸越小,第一特征图也可以包括对应不同分辨率的预设数量张第一特征图,每张第一特征图是基于相同分辨率的第二特征图的第二特征信息得到的。上述 方式,能够有利于通过不同分辨率的预设数量张第二特征图来进行三维重建,从而能够有利于进一步提高三维重建的精细度。
在一个实施场景中,预设数量可以根据实际应用情况进行设置,例如,可以设置两种不同分辨率、三种不同分辨率、四种不同分辨率等等,在此不做限定。此外,不同分辨率也可以根据实际应用情况进行设置,例如,可以设置640*480和480*360两种分辨率,也可以设置1280*960和640*480两种分辨率;或者,可以设置640*480、480*360和360*240三种分辨率,也可以设置1280*960、640*480和480*360三种分辨率,在此不做限定。
在另一个实施场景中,如前所述,为了提高三维重建的效率,可以预先训练一个三维重建模型,且该三维重建模型可以包括特征提取网络,进而可以基于该特征提取网络分别对若干第一关键图像进行特征提取,得到不同分辨率的第二特征图。该特征提取网络可以包括但不限于FPN(Feature Pyramid Networks,特征金字塔网络)等,在此不做限定。
在另一个实施场景中,在第一关键图像的第二特征图包括对应N种不同分辨率的N张第二特征图的情况下,第一空间也包括分别与N种不同分辨率对应的N个第一空间,且分辨率越高,第一空间中体素的尺寸越小。例如,在第一关键图像的第二特征图包括1280*960和640*480两种分辨的第二特征图的情况下,第一空间也包括与分辨率1280*960对应的第一空间和与分辨率640*480对应的第一空间,且与分辨率1280*960对应的第一空间中体素的尺寸小于与分辨率640*480对应的第一空间中体素的尺寸。其他情况可以以此类推,在此不再一一举例。在一些实施例中,对于第i种分辨率对应的第一空间中体素的第一特征信息,可以基于至少两帧第一关键图像中第i种分辨率的第二特征图中对应的第二特征信息得到,详细过程可以参阅下述公开实施例,在此暂不赘述。
在又一个实施场景中,第i种分辨率对应的第一空间中体素的宽度可以采用下式计算得到:
Figure PCTCN2021102117-appb-000001
上述公式(1)中,w i表示第i种分辨率对应的第一空间中体素的宽度,s表示预先设置的基准体素宽度,可以根据实际应用情况进行调整。此外,需要说明的是,i是将不同分辨率按照由低到高的顺序排序之后的第i种分辨率。仍以上述1280*960、640*480和480*360三种分辨率为例,由低到高排序之后,分别为480*360、640*480、1280*960,即在计算分辨率480*360对应的第一空间的体素的宽度时,i为1,在计算分辨率640*480对应的第一空间的体素的宽度时,i为2,在计算分辨率1280*960对应的第一空间的体素的宽度时,i为3,其他情况可以以此类推,在此不再一一举例。
步骤S13:基于第一特征图,得到本次重建的第一重建结果。
在一个实施场景中,可以基于第一特征图进行预测,得到第一空间中各体素的第一重建值和第一重建值在预设数值范围内的概率值,且第一重建值用于表示体素与待重建目标中关联物体表面之间的距离,在此基础上,可以对上述预测结果进行稀疏化(sparsify)处理,可以选择第一空间中概率满足预设条件的体素,并基于选择的体素的第一重建值,得到本次充电的第一重建结果。上述方式,能够滤除概率值不满足预设条件的体素对于三维重建的干扰,能够有利于进一步提高三维重建的准确性。
在一个实施场景中,为了提高三维重建的效率,可以预先训练一个三维重建模型,且该三维重建模型可以包括预测网络,从而可以将第一特征图输入预测网络,得到第一空间中各个体素的第一重建值和第一重建值在预设数值范围内的概率值。预测网络可以包括但不限于MLP(Multi-Layer Perceptron,多层感知机)等等,在此不做限定。
在另一个实施场景中,第一重建值可以采用TSDF(Truncated Signed Distance Function,截断有符号距离函数)进行表示,在此情况下,预设数值范围可以为-1至1之间。为了便于描述,可以将第j个体素的第一重建值表示为
Figure PCTCN2021102117-appb-000002
需要说明的是,在
Figure PCTCN2021102117-appb-000003
大于0 且小于1的情况下,表示第j个体素位于关联物体表面之前的截断距离λ之内,而在
Figure PCTCN2021102117-appb-000004
小于0且大于-1的情况下,表示第j个体素位于关联物体表面之后的截断距离λ之内。
在又一个实施场景中,第一重建值在预设数值范围内的概率值可以视为第一重建值在预设数值范围内的可能性,且概率值越高,第一重建值在预设数值范围内的可能性越高,反之,概率值越低,第一重建值在预设数值范围内的可能性越低。
在又一个实施场景中,预设条件可以设置为包括概率值大于预设概率阈值。预设概率阈值可以根据实际应用情况进行设置。例如,在对三维重建的准确性要求较高的情况下,预设概率阈值可以设置地稍大一些,如可以设置为0.9、0.95等,或者,在对三维重建的准确性要求相对宽松的情况下,预设概率阈值可以设置地稍小一些,如可以设置为0.8、0.85等,在此不做限定。
在又一个实施场景中,在选择得到第一空间中概率值满足预设条件的体素之后,可以将选择的体素及其第一重建值整体作为本次重建的第一重建结果。
在又一个实施场景中,为了便于后续基于重建值重建出待重建目标的表面,关联物体表面可以为待重建目标中与体素距离最近的物体表面。以待重建目标是客厅为例,对于最靠近客厅中地板的体素而言,关联物体表面可以为地板,而对于最靠近客厅中沙发的体素而言,关联物体表面可以为沙发,其他情况可以以此类推,在此不再一一举例。上述方式,能够有利于进一步提高三维重建的准确性。
在另一个实施场景中,如前所述,每帧第一关键图像的第二特征图均可以包括对应不同分辨率的预设数量张第二特征图,在此情况下,可以按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率,在此基础上将上一次选择的分辨率对应的第一重建结果进行上采样(Upsample),并将上采样后的第一重建结果与当前分辨率对应的第一特征图进行融合,得到与当前分辨率对应的融合特征图,从而基于融合特征图,得到与当前分辨率对应的第一重建结果,进而在当前分辨率并非最高分辨率的情况下,重新执行按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率的步骤以及后续步骤,或者在当前分辨率为最高分辨率的情况下,将与当前分辨率对应的第一重建结果作为本次重建最终的第一重建结果。上述方式,能够由基于“低分辨率”的第一特征图至基于“高分辨率”的第一特征图逐渐进行三维重建,从而能够有利于实现“由粗到细”的三维重建,进而能够有利于进一步提高三维重建的精细度。
在一个实施场景中,可以采用最近邻插值等上采样方式对第一重建结果进行上采样。需要说明的是,为了便于后续将上采样后的第一重建结果与当前分辨率对应的第一特征图进行融合,在体素宽度由诸如上述公式(1)计算得到的情况下,即在第i种分辨率对应的第一空间中体素的宽度两倍于第i+1种分辨率对应的第一空间中体素的宽度的情况下,上采样之后体素的宽度为原宽度的一半,从而可以使得上采样后的第一重建结果中体素的宽度与当前分辨率对应的第一空间中体素的宽度相同。
在另一个实施场景中,对于每一体素而言,可以将上采样后的第一重建结果中第j个体素的第一重建值与当前分辨率对应的第一空间中第j个体素的第一特征信息进行拼接(Concatenate),从而实现将上采样后的第一重建结果与当前分辨率对应的第一特征图的融合。例如,当前分辨率对应的第一空间中每一体素的第一特征信息可以表示为维度d的矩阵,而上采样后的第一重建结果中每一体素的第一重建值可以视为维度1的矩阵,故将两者拼接之后所得到的融合特征图可以视为维度d+1的矩阵,进而融合特征图中每一体素可以表示为d+1维度的矩阵。
在又一个实施场景中,基于融合特征图,得到与当前分辨率对应的第一重建结果的详细过程,可以参阅前述基于第一特征图得到本次重建的第一重建结果的相关描述,在此不再赘述。
在又一个实施场景中,请结合参阅图3,图3是本申请实施例三维重建方法一实施例 的过程示意图。如图3所示,在对待重建目标拍摄的图像序列中挑选得到若干第一关键图像,经特征提取网络(如前述FPN)进行特征提取之后,对于每帧第一关键图像而言,提取得到3种不同分辨率的第二特征图,这3种不同分辨率按照由低到高排序之后,可以分别记为分辨率1、分辨率2和分辨率3,分辨率1对应的第一空间可以记为第一空间1,分辨率2对应的第一空间可以记为第一空间2、分辨率3对应的第一空间可以记为第一空间3,对于每种分辨率,可以基于与该种分辨率对应的第一空间的各体素在该种分辨率的第二特征图中对应的第二特征信息,得到该种分辨率对应的第一空间的第一特征图。为了便于描述可以将本次重建(即第t时间步)第一空间1的第一特征图记为F t 1,第一空间2的第一特征图记为F t 2,第一空间3的第一特征图记为F t 3。按照分辨率由低到高的顺序,先选择分辨率1作为当前分辨率,并将上一次选择的分辨率对应的第一重建结果进行上采样,由于分辨率1位首次选择的分辨率,故不存在上一次选择的分辨率对应的第一重建结果,从而可以直接基于诸如MLP等预测网络对当前分辨率对应的第一特征图F t 1进行预测,得到第一空间1中各体素的第一重建值和第一重建值在预设数值范围内的概率值,为了便于描述可以记为
Figure PCTCN2021102117-appb-000005
再对
Figure PCTCN2021102117-appb-000006
进行稀疏化(即图3中S)处理得到第一重建结果。由于当前分辨率并非最高分辨率,故可以接着将分辨率2作为当前分辨率,并将上一次选择的分辨率1对应的第一重建结果进行上采样(即图3中U),并基于上采样后的第一重建结果与当前分辨率对应的第一特征图F t 2进行拼接(即图3中C)处理,得到与分辨率2对应的融合特征图,从而基于诸如MLP等预测网络对融合特征图进行预测,得到第一空间2中各体素的第一重建值和第一重建值在预设数值范围内的概率值,为了便于描述可以记为
Figure PCTCN2021102117-appb-000007
再对
Figure PCTCN2021102117-appb-000008
进行稀疏化(即图3中S)处理得到第一重建结果。由于当前分辨率仍然并非最高分辨率,故可以接着将分辨率3作为当前分辨率,并将上一次选择的分辨率2对应的第一重建结果进行上采样(即图3中U),并基于上采样后的第一重建结果与当前分辨率对应的第一特征图F t 3进行拼接(即图3中C)处理,得到与分辨率3对应的融合特征图,从而基于诸如MLP等预测网络对融合特征图进行预测,得到第一空间3中各体素的第一重建值和第一重建值在预设数值范围内的概率值,为了便于描述可以记为
Figure PCTCN2021102117-appb-000009
再对
Figure PCTCN2021102117-appb-000010
进行稀疏化(即图3中S)处理得到第一重建结果。由于当前分辨率为最高分辨率,故可以将当前分辨率对应的第一重建结果作为本次重建最终的第一重建结果,为了便于描述可以将本次重建最终的第一重建结果记为
Figure PCTCN2021102117-appb-000011
其他情况可以以此类推,在此不再一一举例。
步骤S14:基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。
在一个实施场景中,如前所述,第一重建结果例如包括第一空间中体素的第一重建值,类似地,第二重建结果包括第二空间中体素的第二重建值,第二空间是包围之前重建的第二关键图像的视锥的总空间,且第一重建值和第二重建值用于表示体素与待重建目标中的关联物体表面之间的距离。例如可以参阅前述关于第一重建值的相关描述,在此不再赘述。在此基础上,可以基于第一空间中体素的第一重建值,更新第二空间中对应体素的第二重建值。上述方式,能够有利于在三维重建过程中基于本次重建过程中第一空间中体素的第一重建值,更新之前重建得到的第二重建结果,进而能够有利于在重建过程中不断完善第二重建结果,提高三维重建的准确性。
在一个实施场景中,在本次重建为对待重建目标的三维重建过程中首次重建的情况下,可以不执行基于本次重建的第一重建结果对之前重建得到的第二重建结果进行更新的步骤。
在另一个实施场景中,可以将第二空间中与第一空间对应部分的体素的第二重建值替换为本次重建第一空间中体素的第一重建值。请继续结合参阅图3,如前所述,为了便于描述本次重建最终的第一重建结果记为
Figure PCTCN2021102117-appb-000012
之前重建得到的第二重建结果可以记为
Figure PCTCN2021102117-appb-000013
通过基于第一空间中体素的第一重建值更新第二空间中对应体素的第二重建值,可以得到更新后的第二重建结果,为了便于描述可以记为
Figure PCTCN2021102117-appb-000014
在又一个实施场景中,在本次重建之后需要进一步重建的情况下,可以重新执行上述步骤S11以及后续步骤,以通过多次重建不断完善第二重建结果。此外,在本次重建之后无需进一步重建的情况下,可以将更新后的第二重建结果
Figure PCTCN2021102117-appb-000015
作为待重建目标的最终重建结果。
在另一个实施场景中,请结合参阅图4,图4是本申请实施例三维重建方法与其他三维重建方法的效果示意图。图4中41和42表示其他重建方法重建得到的重建结果,图4中43和44表示本申请实施例三维重建方法重建得到的重建结果。如图4中41和42所示,其他三维重建方法重建得到的重建结果在矩形框圈出的墙壁部分呈现出明显的分散和分层现象,而图4中43和44中,本申请实施例三维重建方法重建得到的重建二级果在矩形框圈出的墙壁部分未呈现出明显的分散或分层现象,且具有较优的平滑度。
上述方案,通过获取用于本次重建的至少两帧第一关键图像,并确定包围至少两帧第一关键图像的视锥的第一空间,且第一关键图像是对待重建目标拍摄得到的,在此基础上基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,且第一特征图包括第一空间中体素的第一特征信息,从而基于第一特征图,得到本次重建的第一重建结果,进而基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新,故此每次重建过程中,均能够对包围至少两帧第一关键图像的视锥的第一空间整体进行三维重建,从而不仅能够大大降低计算负荷,还能够降低重建结果出现分层或分散的概率,进而能够提高三维重建过程的实时性以及三维重建结果的平滑度。
请参阅图5,图5是图1A中步骤S12一实施例的流程示意图。如前述公开实施例所述,可以分别对每帧第一关键图像进行特征提取,得到第一关键图像的第二特征图,从而可以基于第一空间的各体素在第二特征图中对应的第二特征信息,得到第一空间的第一特征图。本公开实施例是基于第一空间的各体素在第二特征图中对应的第二特征信息得到第一特征图的流程示意图。可以包括如下步骤:
步骤S51:分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征信息。
本公开实施例中,对于第一空间中每一体素,可以分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征信息。
在一个实施场景中,可以基于第一关键图像的相机位姿参数以及相机内部参数对第二特征图中各个像素点进行反投影,确定第一空间中与第二特征图中像素点对应的体素。在此基础上,对于第一空间中每一体素,可以从各帧第一关键图像的第二特征图中提取得到与该体素对应的像素点的第二特征信息。
在另一个实施场景中,请结合参阅图6,图6是获取第一特征图一实施例的状态示意图。如图6所示,为了便于描述,与图2类似地,图6也以“二维视角”描述获取第一特征图的详细过程。如图6所示,通过对第二特征图中像素点进行反投影,能够确定第一空间中与各个像素点对应的体素。需要说明的是,图6中不同颜色的方格表示对应于不同的第二特征信息。
步骤S52:将体素分别对应至少两帧第一关键图像的第二特征信息进行融合,得到体素的第一特征信息。
在一个实施场景中,请继续结合参阅图6,可以将体素分别对应至少两帧第一关键图像的第二特征信息的平均值,作为体素的第一特征信息。例如,第一空间中第k个体素,在第1个第一关键图像的第二特征图中对应于第i行第j列个像素点,而在第2个第一关键图像的第二特征图中对应于第m行第n列个像素点,在此基础上,可以将第1个第一关键图像的第二特征图中第i行第j列个像素点的第二特征信息和第2个第一关键图像的第二特征图中第m行第n列个像素点的第二特征信息的平均值,作为第一空间中第k个体素 的第一特征信息,其他情况可以以此类推,在此不再一一举例。
在另一个实施场景中,还可以将体素分别对应至少两帧第一关键图像的第二特征信息的加权结果,作为体素的第一特征信息。上述加权结果可以包括但不限于:加权求和、加权平均等,在此不做限定。
在又一个实施场景中,如前述公开实施例所述,在每帧第一关键图像的第二特征图中均未提取得到体素对应的第二特征信息的情况下,将预设特征信息作为体素的第一特征信息。可以参阅前述公开实施例中相关描述,在此不再赘述。
步骤S53:基于第一空间的各体素的第一特征信息,得到第一空间的第一特征图。
在求得第一空间中各个像素点的第一特征信息之后,即可将第一空间中各个体素的第一特征信息整体作为第一特征图。
区别于前述实施例,通过分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征信息,并将体素分别对应至少两帧第一关键图像的第二特征信息进行融合,得到体素的第一特征信息,从而基于第一空间的各体素的第一特征信息,得到第一空间的第一特征图,故此对于第一空间中每一体素而言,均融合有对应每帧第一关键图像的第二特征信息,能够有利于进一步提高第一空间的第一特征图的精准性。
请参阅图7,图7是图1A中步骤S13一实施例的流程示意图。本公开实施例中,第一重建结果是采用三维重建模型得到的。可以包括如下步骤:
步骤S71:获取三维重建模型的融合网络在之前重建所得到的第一历史隐层状态。
本公开实施例中,第一历史隐层状态包括第二空间中体素对应的状态值,第二空间是包围之前重建的第二关键图像的视锥的总空间。需要说明的是,在本次重建为首次重建的情况下,第二空间即为本次重建的第一空间,且在此情况下,可以将第一历史隐层状态所包含的第二空间中体素对应的状态值设置为预设状态值(如,将预设状态值设置为0)。
步骤S72:从第一历史隐层状态中,提取第一空间的体素对应的状态值,以作为第二历史隐层状态。
请结合参阅图8,图8是获取本次隐层状态一实施例的状态示意图。需要说明的是,为了便于描述,与前述图2和图6类似,图8是在“二维视角”描述的获取本次隐层状态的状态示意图。如图8所示,为了便于描述,可以将第一历史隐层状态记为
Figure PCTCN2021102117-appb-000016
第一历史隐层状态
Figure PCTCN2021102117-appb-000017
中不同灰度的方格表示体素的状态值,无颜色的方格表示对应体素无状态值,此外第一历史隐层状态
Figure PCTCN2021102117-appb-000018
中的矩形框表示第一空间,从第一历史隐层状态
Figure PCTCN2021102117-appb-000019
中提取第一空间的体素对应的状态值,可以得到第二历史隐层状态
Figure PCTCN2021102117-appb-000020
其他情况可以以此类推,在此不再一一举例。
步骤S73:基于融合网络执行:基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态。
在一个实施场景中,可以将第一特征图、第二历史隐层状态输入融合网络,从而输出得到本次隐层状态。融合网络可以设置为包括但不限于GRU(Gated Recurrent Unit,门控循环单元),在此不做限定。
在另一个实施场景中,请继续结合参阅图8,在更新第二历史隐层状态
Figure PCTCN2021102117-appb-000021
之前,可以进一步对第一特征图F t l进行几何信息提取,得到几何特征图
Figure PCTCN2021102117-appb-000022
且几何特征图包括体素的几何信息,从而可以基于几何特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态。上述方式,能够在提取得到的体素的几何信息的基础上对本次重建的第一空间的第二历史隐层状态进行更新,有利于提高三维重建的准确性。
在一个实施场景中,可以通过三维稀疏卷积、pointnet等网络对第一特征图F t l进行几何信息提取,得到几何特征图
Figure PCTCN2021102117-appb-000023
可以根据实际应用需要进行设置,在此不做限定。
在另一个实施场景中,以融合网络包括门控循环单元GRU为例,请结合参阅图8, GRU通过融合几何特征图
Figure PCTCN2021102117-appb-000024
和第二历史隐层状态
Figure PCTCN2021102117-appb-000025
最终可以得到本次隐层状态
Figure PCTCN2021102117-appb-000026
为了便于描述,可以记门控循环单元GRU的更新门控为z t,重置门控为r t,可以表示为:
Figure PCTCN2021102117-appb-000027
Figure PCTCN2021102117-appb-000028
上述公式(2)和公式(3)中,sparseconv表示稀疏卷积,W z,W r表示稀疏卷积的网络权重,σ表示激活函数(如,sigmoid)。
在此基础上,更新门控z t和重置门控r t可以决定了从几何特征图
Figure PCTCN2021102117-appb-000029
中引入多少信息进行融合,以及从第二历史隐层状态
Figure PCTCN2021102117-appb-000030
中引入多少信息进行融合。可以表示为:
Figure PCTCN2021102117-appb-000031
Figure PCTCN2021102117-appb-000032
上述公式(4)和公式(5)中,sparseconv表示稀疏卷积,W h表示稀疏卷积的网络权重,tanh表示激活函数。由此可见,作为一种数据驱动方式,GRU在三维重建过程中能够提供一种选择性的注意力机制。
步骤S74:采用三维重建模型对本次隐层状态进行预测,得到第一重建结果。
在一个实施场景中,如前述公开实施例所述,三维模型还可以进一步包括预测网络(如,MLP),在此基础上,可以基于预测网络对本次隐层状态
Figure PCTCN2021102117-appb-000033
进行预测,得到第一重建结果。
在一个实施场景中,基于预测网络对本次隐层状态
Figure PCTCN2021102117-appb-000034
进行预测可以得到第一空间中各体素的第一重建值和第一重建值在预设数值范围内的概率值,且第一重建值用于表示体素与待重建目标中的关联物体表面之间的距离,在此基础上,可以选择第一空间中概率值满足预设条件的体素,从而可以基于选择的体素的第一重建值,得到本次重建的第一重建结果。详细可以参阅前述公开实施例中相关描述,在此不再赘述。
在另一个实施场景中,请继续结合参阅图8,在得到本次隐层状态
Figure PCTCN2021102117-appb-000035
之后,可以基于本次隐层状态
Figure PCTCN2021102117-appb-000036
中的状态值,更新第一历史隐层状态
Figure PCTCN2021102117-appb-000037
中相应体素对应的状态值,得到更新后的第一历史隐层状态
Figure PCTCN2021102117-appb-000038
以供下次重建使用。上述方式,能够在更新得到本次隐层状态之后,进一步更新第二空间的第一历史隐层状态,有利于在本次重建的基础上进一步提高第二空间的第一历史隐层状态的准确性,从而能够有利于提高三维重建的准确性。
在一个实施场景中,可以将第一历史隐层状态
Figure PCTCN2021102117-appb-000039
中第一空间中体素的状态值直接替换为本次隐层状态
Figure PCTCN2021102117-appb-000040
中对应体素的状态值。
在又一个实施场景中,请结合参阅图9,图9是本申请实施例三维重建方法另一实施例的过程示意图。不同于图3所示的三维重建过程,如本公开实施例所述,图9所示的三维重建过程引入了之前重建得到的第一历史隐层状态(即图9中global hidden state),即在前述公开实施例所描述的三维重建过程中,每次基于诸如MLP等预测网络对当前分辨率对应的第一特征图F t i进行预测可以包括如下步骤:获取在之前重建所得到的与当前分辨率对应的第一历史隐层状态,并从当前分辨率对应的第一历史隐层状态中,提取第一空间的体素对应的状态值,以作为第二历史隐层状态,并基于诸如GRU的融合网络执行:基于与当前分辨率对应的第一特征图F t i对第二历史隐层状态中的状态值进行更新,得到与当前分辨率对应的本次隐层状态,在此基础上再基于诸如MLP等预测网络对当前分辨率对应的本次隐层状态进行预测,得到当前分辨率对应的第一重建结果。本公开实施例仅描述与前述公开实施例的不同之处,其他过程可以参阅前述公开实施例中相关描述,在此不再赘述。
区别于前述实施例,通过将第一重建结果设置为是采用三维重建模型得到的,并获取三维重建模型的融合网络在之前重建所得到的第一历史隐层状态,且第一历史隐层状态包括第二空间中体素对应的状态值,第二空间是包围之前重建的第二关键图像的视锥的总空间,在此基础上从第一历史隐层状态中,提取第一空间的体素对应的状态值,以作为第二历史隐层状态,从而基于融合网络执行:基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态,进而采用三维重建模型对本次隐层状态进行预测,得到第一重建结果,故此每次重建过程中均能参考之前重建所得到的第一历史隐层状态,能够有利于提高本次重建与之前重建的一致性,从而能够有利于降低本次重建结果与之前重建结果之间发生分层或分散的概率,进而能够有利于进一步提高三维重建结果的平滑度。
在一些公开实施例中,上述任一三维重建方法实施例中的三维重建结果可以是由三维重建模型重建得到的。可以预先收集对样本目标拍摄的若干组样本图像,每组样本图像包括至少两帧样本关键图像,且每组样本图像所包含的至少两帧样本关键图像的视锥由第一样本空间包围,第一样本空间包括若干体素,可以参阅前述公开实施例中相关描述,在此不再赘述。与前述公开实施例不同的是,每组样本图像标注有第一样本空间中各个体素的第一实际重建值和第一实际重建值在预设数值范围内的实际概率值,且第一实际重建值用于表示体素与样本目标中关联物体表面之间的距离,第一实际重建值可以采用TSDF表示,关联物体表面可以参见前述公开实施例中的相关描述,在此不再赘述。此外,在第一实际重建值位于预设数值范围内的情况下,第一实际重建值对应的实际概率值可以标注为1,而在第一实际重建值不位于预设数值范围内的情况下,第一实际重建值对应的实际概率值可以标注为0。在此基础上,可以将一组样本图像所包含的至少两帧样本关键图像输入三维重建模型的特征提取网络(如,FPN),得到第一样本空间的第一样本特征图,且第一样本特征图包括第一样本空间中体素的第一样本特征信息,从而可以将第一样本特征图输入三维重建模型的预测网络,得到第一样本重建结果,且第一样本重建结果包括第一样本空间中各体素的第一样本重建值和第一样本重建值在预设数值范围内的样本概率值,进而可以基于第一样本空间中各体素的第一样本重建值和第一实际重建值之间的差异,以及第一样本空间中各体素的样本概率值和实际概率值之间的差异,调整三维重建模型的网络参数。
在一个实施场景中,可以基于二分类交叉熵损失(binary cross-entropy,BCE)函数计算样本概率值和实际概率值之间的第一损失值,并基于L1损失函数计算第一样本重建值和第一实际重建值之间的第二损失值,从而可以基于第一损失值和第二损失值,调整三维重建模型的网络参数。
在另一个实施场景中,与前述公开实施例类似地,在预测第一样本重建结果过程中,可以获取三维重建模型的融合网络在之前重建所得到的第一样本历史隐层状态,且第一样本历史隐层状态包括第二样本空间中体素对应的样本状态值,第二样本空间时包围之前重建的若干组样本图像的视锥的总空间,在此基础上,可以从第一样本历史隐层状态中,提取第一样本空间的体素对应的样本状态值,以作为第二样本历史隐层状态,从而可以基于融合网络执行:基于第一样本特征图对第二样本历史隐层状态中的样本状态值进行更新,得到本次样本隐层状态,进而可以基于预测网络对本次样本隐层状态进行预测,得到第一样本重建结果。可以参阅前述公开实施例中相关描述,在此不再赘述。
请参阅图10,图10是本申请实施例三维重建装置100一实施例的框架示意图。三维重建装置100包括关键图像获取模块101、第一空间确定模块102、第一特征获取模块103、重建结果获取模块104和重建结果更新模块105,关键图像获取模块101配置为获取用于本次重建的至少两帧第一关键图像;第一空间确定模块102配置为确定包围至少两帧第一关键图像的视锥的第一空间;其中,第一关键图像是对待重建目标拍摄得到的;第一特征获取模块103配置为基于至少两帧第一关键图像中的图像信息,得到第一空间的第一特征图,其中,第一特征图包括第一空间中体素的第一特征信息;重建结果获取模块104配置 为基于第一特征图,得到本次重建的第一重建结果;重建结果更新模块105配置为基于本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。
在一些公开实施例中,三维重建装置100还包括第二特征获取模块,配置为分别对每帧第一关键图像进行特征提取,得到第一关键图像的第二特征图,第一特征获取模块103配置为基于第一空间的各体素在第二特征图中对应的第二特征信息,得到第一空间的第一特征图。
在一些公开实施例中,第一特征获取模块103包括特征信息提取子模块,配置为分别从每帧第一关键图像的第二特征图中,提取体素对应的第二特征信息,第一特征获取模块103包括特征信息融合子模块,配置为将体素分别对应至少两帧第一关键图像的第二特征信息进行融合,得到体素的第一特征信息,第一特征获取模块103包括第一特征获取子模块,配置为基于第一空间的各体素的第一特征信息,得到第一空间的第一特征图。
在一些公开实施例中,特征信息融合子模块配置为将体素对应每帧第一关键图像的第二特征信息的平均值,作为体素的第一特征信息。
在一些公开实施例中,第一特征获取模块103还包括特征信息设置子模块,配置为在每帧第一关键图像的第二特征图中均未提取得到体素对应的第二特征信息的情况下,将预设特征信息作为体素的第一特征信息。
在一些公开实施例中,每帧第一关键图像的第二特征图均包括对应不同分辨率的预设数量张第二特征图;第一空间包括对应不同分辨率的预设数量个第一空间,分辨率越高,第一空间中体素的尺寸越小;第一特征图包括对应不同分辨率的预设数量张第一特征图,每张第一特征图是基于相同分辨率的第二特征图的第二特征信息得到。
在一些公开实施例中,重建结果获取模块104包括分辨率选择子模块,配置为按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率,重建结果获取模块104包括特征图更新子模块,配置为将上一次选择的分辨率对应的第一重建结果进行上采样,并将上采样后的第一重建结果与当前分辨率对应的第一特征图进行融合,得到与当前分辨率对应的融合特征图,重建结果获取模块104包括重建结果获取子模块,配置为基于融合特征图,得到与当前分辨率对应的第一重建结果,重建结果获取模块104包括循环执行子模块,配置为在当前分辨率并非最高分辨率的情况下,结合前述分辨率选择子模块、特征图更新子模块和重建结果获取子模块重新执行按照分辨率由低到高的顺序,依次选择一种分辨率作为当前分辨率的步骤以及后续步骤,重建结果获取模块104包括第一结果确定子模块,配置为在当前分辨率为最高分辨率的情况下,将与当前分辨率对应的第一重建结果作为本次重建最终的第一重建结果。
在一些公开实施例中,重建结果获取模块104包括结果预测子模块,配置为基于第一特征图进行预测,得到第一空间中各体素的第一重建值和第一重建值在预设数值范围内的概率值;其中,第一重建值配置为表示体素与待重建目标中的关联物体表面之间的距离,重建结果获取模块104包括体素选择子模块,配置为选择第一空间中概率值满足预设条件的体素,重建结果获取模块104包括第二结果确定子模块,配置为基于选择的体素的第一重建值,得到本次重建的第一重建结果。
在一些公开实施例中,第一重建结果包括第一空间中体素的第一重建值,第二重建结果包括第二空间中体素的第二重建值,第二空间是包围之前重建的第二关键图像的视锥的总空间,第一重建值和第二重建值配置为表示体素与待重建目标中的关联物体表面之间的距离,重建结果更新模块105配置为基于第一空间中体素的第一重建值,更新第二空间中对应体素的第二重建值。
在一些公开实施例中,关联物体表面为待重建目标中与体素距离最近的物体表面。
在一些公开实施例中,第一重建结果是采用三维重建模型得到的,重建结果获取模块104包括隐层状态获取子模块,配置为获取三维重建模型的融合网络在之前重建所得到的 第一历史隐层状态;其中,第一历史隐层状态包括第二空间中体素对应的状态值,第二空间是包围之前重建的第二关键图像的视锥的总空间,重建结果获取模块104包括隐层状态提取子模块,配置为从第一历史隐层状态中,提取第一空间的体素对应的状态值,以作为第二历史隐层状态,重建结果获取模块104包括隐层状态更新子模块,配置为基于融合网络执行:基于第一特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态,重建结果获取模块104包括重建结果预测子模块,配置为采用三维重建模型对本次隐层状态进行预测,得到第一重建结果。
在一些公开实施例中,在本次重建为首次重建的情况下,第一历史隐层状态中的状态值为预设状态值。
在一些公开实施例中,融合网络包括:门控循环单元。
在一些公开实施例中,三维重建模型还包括预测网络,重建结果预测子模块配置为基于预测网络对本次隐层状态进行预测,得到第一重建结果。
在一些公开实施例中,重建结果获取模块104包括几何特征提取子模块,配置为对第一特征图进行几何信息提取,得到几何特征图;其中,几何特征图包括体素的几何信息,隐层状态更新子模块配置为基于几何特征图对第二历史隐层状态中的状态值进行更新,得到本次隐层状态。
在一些公开实施例中,重建结果获取模块104还包括历史状态更新子模块,配置为基于本次隐层状态中的状态值,更新第一历史隐层状态中相应体素对应的状态值。
在一些公开实施例中,在拍摄所述待重建目标过程中,获取所述至少两帧第一关键图像;第一关键图像对应有相机位姿参数,相机位姿参数包括平移距离和旋转角度,第一关键图像满足以下至少之一:相邻第一关键图像之间平移距离的差异大于预设距离阈值,相邻第一关键图像之间旋转角度的差异大于预设角度阈值。
请参阅图11,图11是本申请实施例电子设备110一实施例的框架示意图。电子设备110包括相互耦接的存储器111和处理器112,处理器112配置为执行存储器111中存储的程序指令,以实现上述任一三维重建方法实施例的步骤。在一个实施场景中,电子设备110可以包括但不限于:微型计算机、服务器,此外,电子设备110还可以包括手机、笔记本电脑、平板电脑等移动设备,在此不做限定。
处理器112配置为控制其自身以及存储器111以实现上述任一三维重建方法实施例的步骤。处理器112还可以称为CPU(Central Processing Unit,中央处理单元)。处理器112可能是一种集成电路芯片,具有信号的处理能力。处理器112还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器112可以由集成电路芯片共同实现。
上述方案,能够提高三维重建过程的实时性以及三维重建结果的平滑度。
请参阅图12,图12为本申请实施例计算机可读存储介质120一实施例的框架示意图。计算机可读存储介质120存储有能够被处理器运行的程序指令121,程序指令121配置为实现上述任一三维重建方法实施例的步骤。
上述方案,提高三维重建过程的实时性以及三维重建结果的平滑度。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以配置为执行上文方法实施例描述的方法,其实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
工业实用性
本公开实施例公开了一种三维重建方法、装置、设备及存储介质,其中,三维重建方法,包括:获取用于本次重建的至少两帧第一关键图像,并确定包围所述至少两帧第一关键图像的视锥的第一空间;其中,所述第一关键图像是对待重建目标拍摄得到的;基于所述至少两帧第一关键图像中的图像信息,得到所述第一空间的第一特征图,其中,所述第一特征图包括所述第一空间中体素的第一特征信息;基于所述第一特征图,得到本次重建的第一重建结果;基于所述本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。

Claims (19)

  1. 一种三维重建方法,所述方法由电子设备执行,包括:
    获取用于本次重建的至少两帧第一关键图像,并确定包围所述至少两帧第一关键图像的视锥的第一空间;其中,所述第一关键图像是对待重建目标拍摄得到的;
    基于所述至少两帧第一关键图像中的图像信息,得到所述第一空间的第一特征图,其中,所述第一特征图包括所述第一空间中体素的第一特征信息;
    基于所述第一特征图,确定本次重建的第一重建结果;
    基于所述本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。
  2. 根据权利要求1所述的方法,其中,在所述获取用于本次重建的至少两帧第一关键图像之后,所述方法还包括:
    分别对每帧所述第一关键图像进行特征提取,得到每帧所述第一关键图像的第二特征图;
    所述基于所述至少两帧第一关键图像中的图像信息,得到所述第一空间的第一特征图,包括:
    基于所述第一空间的各体素在所述第二特征图中对应的第二特征信息,得到所述第一空间的所述第一特征图。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一空间的各体素在所述第二特征图中对应的第二特征信息,得到所述第一空间的所述第一特征图,包括:
    分别从每帧所述第一关键图像的所述第二特征图中,提取所述体素对应的第二特征信息;
    将所述体素分别对应所述至少两帧第一关键图像的所述第二特征信息进行融合,得到所述体素的第一特征信息;
    基于所述第一空间的各体素的所述第一特征信息,得到所述第一空间的所述第一特征图。
  4. 根据权利要求3所述的方法,其中,所述将所述体素分别对应所述至少两帧第一关键图像的所述第二特征信息进行融合,得到所述体素的第一特征信息,包括以下至少之一:
    将所述体素分别对应所述至少两帧第一关键图像的第二特征信息的平均值,作为所述体素的第一特征信息;
    在每帧所述第一关键图像的第二特征图中均未提取得到所述体素对应的第二特征信息的情况下,将预设特征信息作为所述体素的第一特征信息。
  5. 根据权利要求2至4任一项所述的方法,其中,每帧所述第一关键图像的第二特征图包括对应不同分辨率的预设数量张第二特征图;所述第一空间包括对应不同所述分辨率的预设数量个第一空间;所述第一特征图包括对应不同所述分辨率的预设数量张第一特征图,每张所述第一特征图是基于相同所述分辨率的所述第二特征图的第二特征信息得到。
  6. 根据权利要求5所述的方法,其中,所述基于所述第一特征图,得到本次重建的第一重建结果,包括:
    按照所述分辨率由低到高的顺序,依次选择一种所述分辨率作为当前分辨率;
    将上一次选择的分辨率对应的第一重建结果进行上采样,并将上采样后的第一重建结果与所述当前分辨率对应的第一特征图进行融合,得到与所述当前分辨率对应的融合特征图;
    基于所述融合特征图,得到与所述当前分辨率对应的第一重建结果;
    在所述当前分辨率并非最高所述分辨率的情况下,重新执行所述按照所述分辨率由低 到高的顺序,依次选择一种所述分辨率作为当前分辨率的步骤以及后续步骤;
    在所述当前分辨率为最高所述分辨率的情况下,将与所述当前分辨率对应的第一重建结果作为本次重建的所述第一重建结果。
  7. 根据权利要求1至6任一项所述的方法,其中,所述基于所述第一特征图,得到本次重建的第一重建结果,包括:
    基于所述第一特征图进行预测,得到所述第一空间中各所述体素的第一重建值和所述第一重建值在预设数值范围内的概率值;其中,所述第一重建值用于表示所述体素与所述待重建目标中的关联物体表面之间的距离;
    选择第一空间中所述概率值满足预设条件的所述体素;
    基于选择的所述体素的所述第一重建值,得到本次重建的第一重建结果。
  8. 根据权利要求1至7任一项所述的方法,其中,所述第一重建结果包括所述第一空间中所述体素的第一重建值,所述第二重建结果包括第二空间中所述体素的第二重建值,所述第二空间是包围之前重建的第二关键图像的视锥的总空间,所述第一重建值和所述第二重建值用于表示所述体素与所述待重建目标中的关联物体表面之间的距离;
    所述基于所述本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新,包括:
    基于所述第一空间中所述体素的第一重建值,更新所述第二空间中对应所述体素的第二重建值。
  9. 根据权利要求7或8所述的方法,其中,所述关联物体表面为所述待重建目标中与所述体素距离最近的物体表面。
  10. 根据权利要求1至9任一项所述的方法,其中,所述第一重建结果是采用三维重建模型得到的;所述基于所述第一特征图,得到本次重建的第一重建结果,包括:
    获取所述三维重建模型的融合网络在之前重建所得到的第一历史隐层状态;其中,所述第一历史隐层状态包括第二空间中所述体素对应的状态值,所述第二空间是包围之前重建的第二关键图像的视锥的总空间;
    从所述第一历史隐层状态中,提取所述第一空间的体素对应的状态值,作为第二历史隐层状态;
    在所述融合网络中,基于所述第一特征图对所述第二历史隐层状态中的所述状态值进行更新,得到本次隐层状态;
    采用所述三维重建模型对所述本次隐层状态进行预测,得到所述第一重建结果。
  11. 根据权利要求10所述的方法,其中,在本次重建为首次重建的情况下,所述第一历史隐层状态中的状态值为预设状态值。
  12. 根据权利要求10所述的方法,其中,所述融合网络包括:门控循环单元;所述三维重建模型还包括预测网络,所述采用所述三维重建模型对所述本次隐层状态进行预测,得到所述第一重建结果,包括:
    基于所述预测网络对所述本次隐层状态进行预测,得到所述第一重建结果。
  13. 根据权利要求10至12任一项所述的方法,其中,在所述基于所述第一特征图对所述第二历史隐层状态中的所述状态值进行更新,得到本次隐层状态之前,所述方法还包括:
    对所述第一特征图进行几何信息提取,得到几何特征图;其中,所述几何特征图包括所述体素的几何信息;
    所述基于所述第一特征图对所述第二历史隐层状态中的所述状态值进行更新,得到本次隐层状态,包括:
    基于所述几何特征图对所述第二历史隐层状态中的所述状态值进行更新,得到本次隐层状态。
  14. 根据权利要求10至13任一项所述的方法,其中,在所述基于所述第一特征图对所述第二历史隐层状态中的所述状态值进行更新,得到本次隐层状态之后,所述方法还包括:
    基于所述本次隐层状态中的状态值,更新所述第一历史隐层状态中相应所述体素对应的状态值。
  15. 根据权利要求1至14任一项所述的方法,其中,所述获取用于本次重建的至少两帧第一关键图像,包括:
    在拍摄所述待重建目标过程中,获取所述至少两帧第一关键图像。
  16. 根据权利要求1至15任一项所述的方法,其中,所述第一关键图像对应有相机位姿参数,所述相机位姿参数包括平移距离和旋转角度,所述第一关键图像满足以下至少之一:相邻所述第一关键图像之间所述平移距离的差异大于预设距离阈值,相邻所述第一关键图像之间所述旋转角度的差异大于预设角度阈值。
  17. 一种三维重建装置,包括:
    关键图像获取模块,配置为获取用于本次重建的至少两帧第一关键图像;
    第一空间确定模块,配置为确定包围所述至少两帧第一关键图像的视锥的第一空间;其中,所述第一关键图像是对待重建目标拍摄得到的;
    第一特征获取模块,配置为基于所述至少两帧第一关键图像中的图像信息,得到所述第一空间的第一特征图,其中,所述第一特征图包括所述第一空间中体素的第一特征信息;
    重建结果获取模块,配置为基于所述第一特征图,得到本次重建的第一重建结果;
    重建结果更新模块,配置为基于所述本次重建的第一重建结果,对之前重建得到的第二重建结果进行更新。
  18. 一种电子设备,包括相互耦接的存储器和处理器,所述处理器配置为执行所述存储器中存储的程序指令,以实现权利要求1至16任一项所述的三维重建方法。
  19. 一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现权利要求1至16任一项所述的三维重建方法。
PCT/CN2021/102117 2021-01-15 2021-06-24 一种三维重建方法、装置、设备及存储介质 WO2022151661A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022546566A JP7352748B2 (ja) 2021-01-15 2021-06-24 三次元再構築方法、装置、機器及び記憶媒体
KR1020227026271A KR20220120674A (ko) 2021-01-15 2021-06-24 3차원 재구성 방법, 장치, 기기 및 저장 매체
US18/318,724 US20230290099A1 (en) 2021-01-15 2023-05-17 Method and apparatus for reconstructing three-dimensional, device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110057035.9A CN112750201B (zh) 2021-01-15 2021-01-15 三维重建方法及相关装置、设备
CN202110057035.9 2021-01-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/318,724 Continuation US20230290099A1 (en) 2021-01-15 2023-05-17 Method and apparatus for reconstructing three-dimensional, device and storage medium

Publications (1)

Publication Number Publication Date
WO2022151661A1 true WO2022151661A1 (zh) 2022-07-21

Family

ID=75652226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102117 WO2022151661A1 (zh) 2021-01-15 2021-06-24 一种三维重建方法、装置、设备及存储介质

Country Status (5)

Country Link
US (1) US20230290099A1 (zh)
JP (1) JP7352748B2 (zh)
KR (1) KR20220120674A (zh)
CN (1) CN112750201B (zh)
WO (1) WO2022151661A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359192A (zh) * 2022-10-14 2022-11-18 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750201B (zh) * 2021-01-15 2024-03-29 浙江商汤科技开发有限公司 三维重建方法及相关装置、设备
CN113706373A (zh) * 2021-08-25 2021-11-26 深圳市慧鲤科技有限公司 模型重建方法及相关装置、电子设备和存储介质
CN114429495B (zh) * 2022-03-14 2022-08-30 荣耀终端有限公司 一种三维场景的重建方法和电子设备
CN116958455B (zh) * 2023-09-21 2023-12-26 北京飞渡科技股份有限公司 基于神经网络的屋顶重建方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537876A (zh) * 2018-03-05 2018-09-14 清华-伯克利深圳学院筹备办公室 基于深度相机的三维重建方法、装置、设备及存储介质
US20200074747A1 (en) * 2018-08-30 2020-03-05 Qualcomm Incorporated Systems and methods for reconstructing a moving three-dimensional object
CN112017228A (zh) * 2019-05-31 2020-12-01 华为技术有限公司 一种对物体三维重建的方法及相关设备
CN112750201A (zh) * 2021-01-15 2021-05-04 浙江商汤科技开发有限公司 三维重建方法及相关装置、设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09231370A (ja) * 1996-02-21 1997-09-05 Canon Inc 画像情報入力装置
JP2004013869A (ja) * 2002-06-12 2004-01-15 Nec Corp 3次元形状生成装置及びそれに用いる3次元形状生成方法並びにそのプログラム
JP2005250692A (ja) * 2004-03-02 2005-09-15 Softopia Japan Foundation 物体の同定方法、移動体同定方法、物体同定プログラム、移動体同定プログラム、物体同定プログラム記録媒体、移動体同定プログラム記録媒体
JP2009048305A (ja) * 2007-08-15 2009-03-05 Nara Institute Of Science & Technology 形状解析プログラム及び形状解析装置
JP2009074836A (ja) * 2007-09-19 2009-04-09 Advanced Telecommunication Research Institute International 画像処理装置、画像処理方法及び画像処理プログラム
US8885879B2 (en) * 2009-04-28 2014-11-11 Nec Corporation Object position estimation device, object position estimation method and program
JP6736422B2 (ja) * 2016-08-23 2020-08-05 キヤノン株式会社 画像処理装置、画像処理の方法およびプログラム
WO2020060196A1 (ko) * 2018-09-18 2020-03-26 서울대학교산학협력단 3차원 영상 재구성 장치 및 그 방법
CN111369681B (zh) * 2020-03-02 2022-04-15 腾讯科技(深圳)有限公司 三维模型的重构方法、装置、设备及存储介质
CN111652966B (zh) * 2020-05-11 2021-06-04 北京航空航天大学 一种基于无人机多视角的三维重建方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537876A (zh) * 2018-03-05 2018-09-14 清华-伯克利深圳学院筹备办公室 基于深度相机的三维重建方法、装置、设备及存储介质
US20200074747A1 (en) * 2018-08-30 2020-03-05 Qualcomm Incorporated Systems and methods for reconstructing a moving three-dimensional object
CN112017228A (zh) * 2019-05-31 2020-12-01 华为技术有限公司 一种对物体三维重建的方法及相关设备
CN112750201A (zh) * 2021-01-15 2021-05-04 浙江商汤科技开发有限公司 三维重建方法及相关装置、设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359192A (zh) * 2022-10-14 2022-11-18 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质
WO2024077809A1 (zh) * 2022-10-14 2024-04-18 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
JP2023514107A (ja) 2023-04-05
US20230290099A1 (en) 2023-09-14
KR20220120674A (ko) 2022-08-30
JP7352748B2 (ja) 2023-09-28
CN112750201A (zh) 2021-05-04
CN112750201B (zh) 2024-03-29

Similar Documents

Publication Publication Date Title
WO2022151661A1 (zh) 一种三维重建方法、装置、设备及存储介质
TWI709107B (zh) 影像特徵提取方法及包含其顯著物體預測方法
CN111369681B (zh) 三维模型的重构方法、装置、设备及存储介质
CN110910486B (zh) 室内场景光照估计模型、方法、装置、存储介质以及渲染方法
WO2020001168A1 (zh) 三维重建方法、装置、设备和存储介质
CN112308200B (zh) 神经网络的搜索方法及装置
CN112132023A (zh) 基于多尺度上下文增强网络的人群计数方法
CN111340866B (zh) 深度图像生成方法、装置及存储介质
US11823322B2 (en) Utilizing voxel feature transformations for view synthesis
WO2020186385A1 (zh) 图像处理方法、电子设备及计算机可读存储介质
TWI643137B (zh) 物件辨識方法及物件辨識系統
WO2022052782A1 (zh) 图像的处理方法及相关设备
CN111652054A (zh) 关节点检测方法、姿态识别方法及装置
US11625813B2 (en) Automatically removing moving objects from video streams
US20210150679A1 (en) Using imager with on-purpose controlled distortion for inference or training of an artificial intelligence neural network
CN112991254A (zh) 视差估计系统、方法、电子设备及计算机可读存储介质
CN116051747A (zh) 一种基于缺失点云数据的房屋三维模型重建方法及设备、介质
CN111161138B (zh) 用于二维全景图像的目标检测方法、装置、设备、介质
CN111091117B (zh) 用于二维全景图像的目标检测方法、装置、设备、介质
CN113902802A (zh) 视觉定位方法及相关装置、电子设备和存储介质
CN117576292A (zh) 三维场景渲染方法及装置、电子设备、存储介质
CN114120233B (zh) 用于人群计数的轻量金字塔空洞卷积聚合网络的训练方法
Zhao et al. Stripe sensitive convolution for omnidirectional image dehazing
CN111178300B (zh) 目标检测方法、装置、设备、介质
CN114898120B (zh) 一种基于卷积神经网络的360度图像显著目标检测方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20227026271

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022546566

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21918857

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21918857

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21918857

Country of ref document: EP

Kind code of ref document: A1