WO2021097843A1 - 三维重建方法、装置、系统和存储介质 - Google Patents

三维重建方法、装置、系统和存储介质 Download PDF

Info

Publication number
WO2021097843A1
WO2021097843A1 PCT/CN2019/120394 CN2019120394W WO2021097843A1 WO 2021097843 A1 WO2021097843 A1 WO 2021097843A1 CN 2019120394 W CN2019120394 W CN 2019120394W WO 2021097843 A1 WO2021097843 A1 WO 2021097843A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
original
supplementary
image
view
Prior art date
Application number
PCT/CN2019/120394
Other languages
English (en)
French (fr)
Inventor
于立冬
Original Assignee
驭势科技(南京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 驭势科技(南京)有限公司 filed Critical 驭势科技(南京)有限公司
Priority to CN201980002779.1A priority Critical patent/CN110998671B/zh
Priority to PCT/CN2019/120394 priority patent/WO2021097843A1/zh
Publication of WO2021097843A1 publication Critical patent/WO2021097843A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the present invention relates to the field of computer vision technology, and more specifically to a three-dimensional reconstruction method, device, system and storage medium.
  • Three-dimensional reconstruction is a process of restoring corresponding three-dimensional objects based on known two-dimensional images. Since the two-dimensional image only includes the information of the target object collected under a specific camera angle of view, it can only reflect the visible part of the target object under the specific camera angle of view. The more two-dimensional images based on different camera perspectives, the higher the degree of restoration of the reconstructed three-dimensional object relative to the target object, and the better the reconstruction quality.
  • 3D reconstruction based on 2D images with limited viewing angles will cause the reconstruction to have multiple different characteristics due to inevitable occlusion and other problems. It is expected to use more viewing angles of two-dimensional images to obtain better reconstruction effects. However, due to the geographic location of the target object, the surrounding environment occlusion, etc., it may not be possible to obtain a two-dimensional image under the desired viewing angle. Therefore, it is difficult to obtain satisfactory three-dimensional reconstruction results.
  • the present invention was made in consideration of the above-mentioned problems.
  • a three-dimensional reconstruction method includes:
  • the original three-dimensional object and the supplementary three-dimensional object are fused to obtain a three-dimensional reconstruction result of the target object.
  • the determining the original three-dimensional object based on the original image feature includes:
  • the original three-dimensional object is determined based on the depth map and the voxel cube.
  • the determining the original three-dimensional object based on the depth map and the voxel cube includes:
  • the depth map of the target object includes the depth map of the main view and the depth map of the back view of the target object.
  • the original two-dimensional image includes multiple images with different viewing angles
  • the determining the original three-dimensional object based on the original image features includes:
  • the fusion of all three-dimensional objects from different perspectives to obtain the original three-dimensional object includes:
  • the determining the voxel of the original three-dimensional object according to the voxels of all standard-view three-dimensional objects includes:
  • the determining the camera pose of the supplementary angle of view of the target object includes:
  • the camera pose of the candidate view angle is the camera pose of the supplementary view angle.
  • the determining the original visible ratio of the visible voxels of the candidate perspective three-dimensional object includes:
  • the original visible ratio is determined according to the counted number of pixels and the total number of pixels of the candidate view three-dimensional object in the projection image.
  • the generating a supplementary two-dimensional image of the target object under the supplementary viewpoint based on the camera pose of the supplementary viewpoint includes:
  • the supplementary two-dimensional image is generated based on the supplementary image feature.
  • the generating a supplementary two-dimensional image of the target object under the supplementary viewpoint based on the camera pose of the supplementary viewpoint includes:
  • the supplementary two-dimensional image is generated according to the target feature.
  • the extraction of the target feature based on the projection image of the original three-dimensional object in the supplementary view angle and the feature of the original image includes:
  • the corresponding feature vector in the target feature is determined based on random noise.
  • the original two-dimensional image includes a plurality of images with different perspectives
  • the original image feature includes a plurality of features corresponding to each image with a different perspective
  • the target is determined according to the original image feature
  • Corresponding feature vectors in features include:
  • the corresponding feature vectors in the multiple original image features are averaged, and the average value is used as the target feature The corresponding feature vector in.
  • the extracting the target feature according to the projection image of the original three-dimensional object under the supplementary angle of view and the original image feature further includes:
  • the projection image and the determined feature vector are spliced together to generate the target feature.
  • the method further includes:
  • the supplementary two-dimensional image is used as the original two-dimensional image, and the three-dimensional reconstruction is performed again based on the camera pose of the new supplementary angle of view, until the proportion of visible voxels in the three-dimensional reconstruction result Greater than the second ratio.
  • a three-dimensional reconstruction device including:
  • the feature extraction module is used to extract the original image features from the original two-dimensional image of the target object
  • the first reconstruction module is configured to determine the original three-dimensional object based on the original image feature
  • a supplementary perspective module configured to determine a camera pose of a supplementary perspective of the target object, wherein the supplementary perspective is different from the first perspective used to generate the original two-dimensional image
  • a supplementary image module configured to generate a supplementary two-dimensional image of the target object in the supplementary perspective based on the camera pose of the supplementary perspective;
  • the second reconstruction module is configured to perform three-dimensional reconstruction on the supplementary two-dimensional image to generate a supplementary three-dimensional object corresponding to the supplementary two-dimensional image;
  • the fusion module is used for fusing the original three-dimensional object and the supplementary three-dimensional object to obtain a three-dimensional reconstruction result of the target object.
  • a three-dimensional reconstruction system including: a processor and a memory, wherein computer program instructions are stored in the memory, and the computer program instructions are used for execution when the processor is running.
  • a storage medium on which program instructions are stored, and the program instructions are used to execute the above-mentioned three-dimensional reconstruction method during operation.
  • the target object by adding a two-dimensional image of the target object under a supplementary perspective based on the original two-dimensional image, and performing three-dimensional reconstruction based on the two-dimensional image under the supplementary perspective and the original two-dimensional image, the target object can be obtained More credible information from the, and improve the quality of the reconstruction of three-dimensional objects.
  • Fig. 1 shows a schematic flowchart of a three-dimensional reconstruction method according to an embodiment of the present invention
  • Figure 2 shows a conversion relationship between a world coordinate system and a spherical coordinate system according to an embodiment of the present invention
  • Fig. 3 shows a schematic flowchart of determining an original three-dimensional object according to an embodiment of the present invention
  • 4A shows a schematic flowchart of determining an original three-dimensional object through multiple original two-dimensional images according to an embodiment of the present invention
  • FIG. 4B shows a schematic diagram of different original two-dimensional images captured by cameras under different viewing angles
  • FIG. 5A shows a schematic flow chart of fusing multiple perspective three-dimensional objects according to an embodiment of the present invention
  • FIG. 5B shows a schematic block diagram of obtaining an original three-dimensional object through multiple original two-dimensional images according to an embodiment of the present invention
  • FIG. 6 shows a schematic flowchart of determining a camera pose of a supplementary angle of view according to an embodiment of the present invention
  • FIG. 7 shows a schematic flowchart of determining the original visible ratio according to an embodiment of the present invention.
  • FIG. 8 shows a schematic diagram of determining the original visible scale according to an embodiment of the present invention.
  • FIG. 9 shows a schematic flowchart of generating a supplementary two-dimensional image according to an embodiment of the present invention.
  • FIG. 10 shows a schematic flowchart of generating a supplementary two-dimensional image according to another embodiment of the present invention.
  • FIG. 11 shows a schematic block diagram of generating a supplementary two-dimensional image according to an embodiment of the present invention
  • FIG. 12 shows a schematic flowchart of iterative reconstruction according to an embodiment of the present invention
  • Fig. 13 shows a schematic block diagram of a three-dimensional reconstruction device according to an embodiment of the present invention
  • Fig. 14 shows a schematic block diagram of a three-dimensional reconstruction system according to an embodiment of the present invention.
  • Fig. 1 shows a schematic flowchart of a three-dimensional reconstruction method 100 according to an embodiment of the present invention. As shown in FIG. 1, the method 100 includes the following steps.
  • the original two-dimensional image may be an image of the target object directly collected by imaging equipment such as a camera or a video camera.
  • the original two-dimensional image can also be an image subjected to preprocessing operations.
  • preprocessing operations such as filtering may be performed on the collected image to obtain an original two-dimensional image with better quality.
  • the original two-dimensional image can be a single image obtained under a single viewing angle, or multiple images obtained under multiple different viewing angles.
  • an encoder composed of a convolutional neural network is used to extract original image features from the original two-dimensional image of the target object.
  • CNN convolutional neural network
  • the original image feature may include multiple feature vectors. Each feature vector corresponds to the corresponding pixel in the original two-dimensional image. Taking a single original two-dimensional image as an example, H ⁇ W feature vectors can be extracted from the original two-dimensional image (H represents the height of the original two-dimensional image, and W represents the width of the original two-dimensional image). The dimension of each feature vector is C.
  • S120 Determine an original three-dimensional object based on the original image feature.
  • a decoder composed of a convolutional neural network is used to generate original three-dimensional objects based on original image features.
  • the original three-dimensional object has a corresponding relationship with the original two-dimensional image.
  • the original three-dimensional object can be represented in the following ways: point cloud (Point Cloud), mesh (Mesh), voxel (Voxel), or depth map (Depth map), etc.
  • the original three-dimensional object is represented by voxels.
  • the voxel representation method is to regard the space where the target object is located as a voxel cube composed of multiple three-dimensional squares, and the value of each three-dimensional square indicates whether the object has a voxel in the spatial position of the square. For example, a value of 0 means that the object does not have a voxel in the spatial position of the corresponding square, and a value of 1 means that there is a voxel.
  • the three-dimensional reconstruction based on the original two-dimensional image of the target object is realized.
  • the encoder and decoder described in the above step S110 and step S120 are only for example, and do not constitute a limitation to the present invention.
  • a person of ordinary skill in the art can use any existing or future-developed algorithm for three-dimensional reconstruction based on known two-dimensional images to implement the above two steps.
  • S130 Determine the camera pose of the supplementary angle of view of the target object, wherein the supplementary angle of view is different from the first angle of view for generating the original two-dimensional image.
  • each two-dimensional image has a corresponding camera angle of view
  • the camera angle of view is the angle of view when the camera collects the two-dimensional image.
  • the camera's angle of view is determined by the camera's pose, which can be used to characterize the camera's angle of view.
  • the camera pose is the position and posture of the camera when it collects a two-dimensional image.
  • the camera pose can be expressed based on various coordinate systems. The following uses a spherical coordinate system as an example to illustrate the camera pose. Exemplarily, the location of the object can be taken as the origin of the spherical coordinate system, and the camera pose can be represented by vectors R and T.
  • R [ ⁇ , ⁇ ], where ⁇ represents the azimuth angle of the camera, ⁇ represents the elevation angle of the camera; T represents the distance ⁇ between the camera and the object.
  • the coordinates (x, y, z) of a camera in the world coordinate system are known, where x represents the coordinates of the camera on the X axis, y represents the coordinates of the camera on the Y axis, and z represents the coordinates of the camera on the Z axis, which can correspond Groundly determine the azimuth angle ⁇ , elevation angle ⁇ and distance ⁇ of the camera in the spherical coordinate system.
  • Figure 2 shows the conversion relationship between the world coordinate system and the spherical coordinate system.
  • the angle of view of the camera in this standard pose can be referred to as the standard angle of view.
  • the posture of the three-dimensional object corresponding to the standard pose of the camera can be called its standard posture.
  • the original three-dimensional object can be transformed to the standard posture. Therefore, different camera poses can be expressed as different azimuth and elevation angles, that is, different vectors [ ⁇ , ⁇ ].
  • the camera pose when the image is generated can be determined according to the camera parameters corresponding to the original two-dimensional image.
  • the angle of view corresponding to the camera pose of the original two-dimensional image is called the first angle of view.
  • this step is used to determine a new supplementary viewing angle.
  • the supplementary perspective is different from the first perspective.
  • the camera pose of the supplementary angle of view is different from the camera pose of the first angle of view.
  • the camera pose of the supplementary angle of view may be determined according to the first angle of view based on a preset rule. For example, based on the camera pose of the first view angle, the azimuth angle and/or the elevation angle are changed according to a preset rule. Specifically, the azimuth angle of the first viewing angle is added to a preset degree to obtain a supplementary viewing angle.
  • S140 Based on the camera pose of the supplementary angle of view, generate a supplementary two-dimensional image of the target object in the supplementary angle of view.
  • a supplementary two-dimensional image of the target object under the supplementary viewing angle can be generated according to the original image information from the original two-dimensional image.
  • the original image information comes from, for example, original image features or original three-dimensional objects, or even from the original two-dimensional image itself.
  • the supplementary viewing angle for generating the supplementary two-dimensional image is different from the first viewing angle for generating the original two-dimensional image, so that there is a difference between the supplementary two-dimensional image and the original two-dimensional image. Because the surface of the target object generally changes continuously, it is reliable to predict the invisible part of the target object in the first view based on the original image information.
  • the supplementary two-dimensional image contains information that does not exist in the original two-dimensional image, and the information is reliable to a certain extent. Supplementing the two-dimensional image can play a supplementary and rich role in the original image information.
  • S150 Perform three-dimensional reconstruction on the supplementary two-dimensional image to generate a supplementary three-dimensional object corresponding to the supplementary two-dimensional image.
  • step S150 may include: firstly, an encoder composed of a convolutional neural network is used to extract supplementary image features from a supplementary two-dimensional image; then, a decoder composed of a convolutional neural network is used to determine corresponding image features based on the supplementary image features. Complement three-dimensional objects.
  • the supplementary three-dimensional object is represented in the form of voxels. It can be understood that since the supplementary 2D image contains information that does not exist in the original image information, the voxels that are visible in the supplementary perspective in the generated supplementary 3D object must be different from the voxels that are visible in the first perspective in the original 3D object.
  • the final three-dimensional reconstruction result of the target object may be determined by taking a union of the voxels of the original three-dimensional object and the supplementary three-dimensional object. For any position in the space, as long as any one of the original three-dimensional object or the supplementary three-dimensional object has a voxel at that position, it is determined that the three-dimensional reconstruction result has a voxel at that position.
  • the final three-dimensional reconstruction result of the target object can also be determined by taking the intersection of the voxels of the original three-dimensional object and the supplementary three-dimensional object. For any position in the space, only if both the original three-dimensional object and the supplementary three-dimensional object have voxels at that location, then it is determined that the three-dimensional reconstruction result has a voxel at that location.
  • Fig. 3 shows a schematic flowchart of determining an original three-dimensional object in step S120 according to an embodiment of the present invention.
  • a decoder composed of a neural network can be used to generate original three-dimensional objects based on original image features.
  • the decoder composed of a convolutional neural network can be implemented using a deep neural network and a voxel neural network.
  • step S120 includes the following steps.
  • S121 Decoding the original image features through the deep neural network to obtain a depth map of the target object.
  • the deep neural network may include multiple 2-dimensional (2D) convolutional layers.
  • Each pixel in the depth map represents the depth of the corresponding position of the target object.
  • the depth may be the distance between the corresponding position of the target object and the camera.
  • the depth d of the pixel corresponding to the feature vector in the original two-dimensional image can be calculated by the following formula:
  • C represents the maximum depth.
  • the angle of view when the camera collects the original two-dimensional image is the main angle of view, that is, the aforementioned first angle of view.
  • the depth map of the main perspective can be generated based on the original image features.
  • the depth map generated based on the original image features may also include the depth map of the back view.
  • the back viewing angle is a viewing angle that is 180 degrees from the main viewing angle.
  • the target object is symmetrical about a plane perpendicular to the main viewing angle direction. According to this, although the part of the target object that is visible from the rear view angle is actually invisible in the main view angle, the depth map of the rear view angle can be obtained according to the original image characteristics.
  • S122 Decode the original image features through a voxel neural network to obtain a voxel cube of the target object.
  • the voxel neural network may also include multiple 2D convolutional layers, which are used to output a voxel cube composed of multiple three-dimensional squares according to the original image features.
  • a voxel cube if the value of a three-dimensional grid is 1, the target object has a voxel at the spatial position of the grid. If the value of the three-dimensional grid is 0, there is no voxel for the target object in the spatial position of the grid.
  • the depth map may include the depth map of the main view and the depth map of the back view.
  • the depth map of the main view includes the three-dimensional information of the front surface of the target object
  • the depth map of the back view includes the three-dimensional information of the back surface of the target object.
  • the three-dimensional information of the target object can be determined based on the three-dimensional information of the front surface and the three-dimensional information of the back surface. Exemplarily, it can be considered that the part between the front surface and the back surface is the target object reconstructed from the depth map.
  • Each point of the front surface obtained from the depth map of the main view can be connected to the corresponding point of the rear surface obtained from the depth map of the rear view.
  • the space enclosed by the front surface, the back surface and all the connecting lines is based on the depth The space occupied by the reconstructed target object.
  • the target object reconstructed from the depth map and the voxel cube obtained from the original image feature can be fused to determine the original three-dimensional object.
  • the target object in the case where both of the above-mentioned situations consider a specific location to be a target object, it will be determined that the target object exists at that location.
  • Determining the original three-dimensional object through the depth map and the voxel cube can effectively use the information in the original two-dimensional image, making the generated original three-dimensional object closer to the target object.
  • the above step S123 may include: firstly, determining the voxels visible in the original three-dimensional object according to the depth map; then, determining other voxels in the original three-dimensional object according to the voxel cube.
  • the depth map may include a depth map of the main view. Since the depth map of the main view is obtained directly based on the original two-dimensional image, the voxels determined according to the depth map of the main view can be considered as visible voxels. These voxels are more reliable and can better reflect the actual shape of the target object.
  • the depth map may also include a depth map of a back view angle. In view of the fact that most objects have a front-to-back symmetrical relationship, it can be considered that the voxels determined according to the depth map of the rear view angle are also visible.
  • the voxels that are visible in the main view and the voxels that are visible in the back view of the original three-dimensional object can be determined according to the depth map of the main view and the depth map of the back view. It can be understood that although the voxel cube also contains voxels on the front surface and the back surface, determining these visible voxels based on the depth map is more accurate than determining these voxels based on the voxel cube.
  • the depth map of the main view and the depth map of the back view cannot reflect other spatial characteristics of the target object.
  • Other voxels in the original three-dimensional object are not visible in the original two-dimensional image. These voxels can be determined based on the voxel cubes generated by the voxel neural network.
  • the voxel cube contains voxels on other surfaces except the front surface (visible in the main view) and the back surface (visible in the back view). These voxels can be used to determine the original three-dimensional object except for the front and back surfaces. Voxels for other surfaces.
  • the original three-dimensional object with higher reliability and accuracy can be obtained.
  • the original two-dimensional image may include multiple images obtained from multiple different viewing angles.
  • FIG. 4A shows a schematic flowchart of determining an original three-dimensional object through multiple original two-dimensional images according to an embodiment of the present invention. As shown in FIG. 4A, when the original two-dimensional image contains multiple images with different viewing angles, step S120 determining the original three-dimensional object may include the following steps.
  • the corresponding perspective three-dimensional objects are determined based on the corresponding original image features extracted from the original two-dimensional images of each perspective.
  • FIG. 4B shows a schematic diagram of different original two-dimensional images captured by cameras at different viewing angles according to an embodiment of the present invention.
  • C1, C2, and C3 represent cameras in different poses.
  • the original two-dimensional images I1, I2, and I3 corresponding to the respective viewing angles can be obtained.
  • a three-dimensional object corresponding to the angle of view of the original two-dimensional image can be obtained through three-dimensional reconstruction, which is referred to herein as a three-dimensional object with different views. It can be understood that the original two-dimensional image corresponding to each sub-view three-dimensional object is different, and therefore the voxels contained therein may also be different.
  • the original three-dimensional object is determined according to the voxels contained in the multiple perspective three-dimensional objects. Any existing technology or algorithm developed in the future can be used to fuse various perspective three-dimensional objects, which is not limited in this application.
  • the original three-dimensional object is determined based on multiple images with different viewing angles. These images contain more credible target object information. Therefore, the three-dimensional reconstruction result of the present application can be made more accurate.
  • FIG. 5A shows a schematic flow chart of fusing all three-dimensional objects with different perspectives according to an embodiment of the present invention. As shown in FIG. 5A, fusing multiple three-dimensional objects with different perspectives includes the following steps.
  • Each perspective three-dimensional object is generated based on its corresponding original two-dimensional image, which corresponds to its own perspective.
  • each perspective three-dimensional object can be rotated to a unified standard posture.
  • the spatial shape of each sub-view three-dimensional object under the same standard view angle can be obtained, that is, the standard view three-dimensional object.
  • S520 Determine the voxel of the original three-dimensional object according to the voxels of all the three-dimensional objects in the standard viewing angle.
  • the voxel of the original three-dimensional object can be determined based on the union or intersection of the voxels of all standard-view three-dimensional objects.
  • FIG. 5B shows a schematic block diagram of determining an original three-dimensional object through multiple original two-dimensional images according to an embodiment of the present invention.
  • each sub-view three-dimensional object is rotated to a standard posture, and then the rotated standard-view three-dimensional object is fused, which is not only easy to implement, but also ensures the accuracy of the result.
  • the standard viewing angle three-dimensional object is represented in the form of voxels. According to the value of the three-dimensional grid is 1 or 0, it can be determined whether there is a voxel in the corresponding position of the three-dimensional grid. When a standard-view three-dimensional object with a voxel in a certain position among all standard-view three-dimensional objects exceeds the first ratio, it is determined that the original three-dimensional object has a voxel at that position.
  • the first ratio is 0.5.
  • (x, y, z) represents the coordinates of a certain position in space
  • k represents the number of three-dimensional objects in the standard view
  • Pi(x, y, z) represents the stereoscopic position of the i-th standard view three-dimensional object at that location
  • the value of the square, O(x,y,z) represents the value of the three-dimensional square of the original three-dimensional object at that position.
  • the original three-dimensional object is determined according to the number of voxels at a certain position in all standard-view three-dimensional objects.
  • the original three-dimensional object is closer to the real target object. Therefore, the three-dimensional reconstruction result obtained by this technical solution is more ideal.
  • step S130 is required to further determine the camera pose of the complementary perspective of the target object.
  • Fig. 6 shows a schematic flowchart of determining a camera pose of a supplementary angle of view according to an embodiment of the present invention. As shown in FIG. 6, the step S130 to determine the camera pose of the supplementary angle of view includes the following steps.
  • S131 Acquire a preset camera pose of at least one candidate angle of view.
  • the camera pose of each candidate perspective can be expressed as the azimuth and elevation angles in the spherical coordinate system, which are represented by vectors ( ⁇ , ⁇ ).
  • the azimuth angle ⁇ is selected as the element in the set [0,45,90,135,180,225,270,315]
  • the elevation angle ⁇ is the set [-60,-30,0,30] ,60]
  • the original three-dimensional object can be rotated from the current perspective to the candidate perspective.
  • the current perspective of the original three-dimensional object can be the first perspective corresponding to the original two-dimensional image.
  • the original three-dimensional object can be determined directly based on the first perspective, and the calculation is simpler.
  • the current view angle of the original three-dimensional object may also be a standard view angle. According to the foregoing example, for the case where there are multiple images with different viewing angles in the original two-dimensional image, the obtained original three-dimensional object may be in a standard viewing angle.
  • the original three-dimensional object can be rotated by the angle of ( ⁇ 2- ⁇ 1, ⁇ 2- ⁇ 1) to obtain Candidate perspective three-dimensional objects.
  • S133 For the camera pose of each candidate perspective, determine the original visible ratio of the visible voxels of the three-dimensional object in the candidate perspective.
  • the visible voxels of the three-dimensional object in the candidate perspective refer to the visible voxels of the three-dimensional object in the candidate perspective in the candidate perspective. Under different viewing angles, the visible voxels of three-dimensional objects are different. Taking a car as an example, assuming that the first perspective (0, 0) corresponding to the original two-dimensional image is the perspective facing the front of the car, then the voxels that make up the front of the car are visible voxels in the first perspective, such as the volume that constitutes the headlights. The voxels, the voxels that make up the wiper, the voxels that make up the hood, etc. are visible voxels. When the car is rotated to a candidate viewing angle, such as the left viewing angle (90, 0), then the voxels constituting the left door are visible voxels, but the voxels constituting the wiper are not visible voxels.
  • a candidate viewing angle such as the left viewing angle (90, 0)
  • the original visible ratio is the proportion of the number of voxels that are visible in the first view in the visible voxels of the candidate view three-dimensional object. It can be understood that if the original two-dimensional image includes multiple images with different viewing angles, the first viewing angle includes multiple viewing angles. It can be understood that the voxels of the three-dimensional object that are visible in the candidate perspective may be visible or invisible in the first perspective. In the example of the aforementioned car, among the visible voxels of the car in the left view, the voxels of the part close to the front of the car are visible from the front view, while the voxels of the part close to the rear of the car are not visible from the front view of the car. . Thus, in this example, the original visible ratio of the visible voxels of the car under the left angle of view is the ratio of pixels visible under the first angle of view in the visible voxels under the left angle of view.
  • the original visible ratio can reflect the credibility of the candidate perspective three-dimensional object.
  • the original three-dimensional object is generated based on the original two-dimensional image.
  • the visible pixels in the original two-dimensional image can truly reflect the shape of the target object, so they are credible pixels.
  • the voxels that are visible in the first viewing angle in the original three-dimensional object determined based on the pixels in the original two-dimensional image are also credible.
  • the credibility of the voxels other than the voxels that are visible in the first view of the original three-dimensional object is lower than the credibility of the voxels that are visible in the first view.
  • the purpose of this step is to select a candidate view angle whose original visible ratio is within a suitable range as a supplementary view angle for 3D reconstruction.
  • the credibility of the three-dimensional object in the supplementary perspective should not be too low, otherwise the three-dimensional reconstruction in this perspective is meaningless; at the same time, the credibility of the three-dimensional object in the supplementary perspective should not be too high, otherwise it will be too close to the first perspective. Can not play the role of supplementary information.
  • the first range is 50%-85%
  • the candidate view angles whose original visible ratio is within this range are used as the supplementary view angles for 3D reconstruction
  • the camera pose under the candidate view angle is the camera pose of the supplementary view angle. This range ensures that the credibility of the three-dimensional object under the supplementary viewing angle is sufficiently high, and it also guarantees the effective amount of supplementary information.
  • the camera pose of the supplementary perspective is determined according to the original visible proportion of the visible voxels of the three-dimensional object in the candidate perspective, and the three-dimensional reconstruction result obtained based on the camera pose of the supplementary perspective is more accurate.
  • Fig. 7 shows a schematic flowchart of determining the original visible ratio according to a specific embodiment of the present invention. As shown in FIG. 7, determining the original visible ratio of the visible voxels of the candidate perspective three-dimensional object includes the following steps.
  • the candidate perspective three-dimensional object Since the candidate perspective three-dimensional object has been rotated to a position facing the candidate perspective, the candidate perspective three-dimensional object is projected in the candidate perspective direction to obtain the voxels of the candidate perspective three-dimensional object in the candidate perspective.
  • the pixels of the candidate perspective three-dimensional object in the projection image correspond to the voxels that are visible in the candidate perspective.
  • the projection image may be determined based on the voxel of the candidate view three-dimensional object that is closest to the projection plane in the candidate view.
  • the projection plane can be a plane perpendicular to the candidate viewing angle where the camera is located. Assuming that the candidate perspective is the direction of the X axis, the voxel of the candidate perspective three-dimensional object that is closest to the projection plane in the candidate perspective can be determined by the following formula:
  • P(:, y, z) represents all voxels on a straight line parallel to the X axis with the Y-axis coordinate of the candidate view three-dimensional object being y and the Z-axis coordinate being z.
  • argmin(P(:,y,z)) represents the minimum distance between the voxel on the aforementioned straight line and the projection plane of the candidate perspective three-dimensional object.
  • S720 Count the number of pixels visible in the first view of the candidate view three-dimensional object in the projection image.
  • the pixels in the projection image correspond to voxels that are visible in the candidate perspective of the three-dimensional object in the candidate perspective.
  • the voxels that are visible in the candidate perspective of the three-dimensional object in the candidate perspective may be visible or invisible in the first perspective of the original two-dimensional image.
  • This step S720 is used to determine the number of pixels in the projection image corresponding to voxels that are visible in the first viewing angle and also visible in the candidate viewing angle.
  • the voxels visible in the first viewing angle can be marked.
  • the voxels visible in the first perspective may be voxels determined by the main perspective depth map in the original three-dimensional object. On the basis of marking the voxels in the original three-dimensional object, these marks are still retained in the candidate perspective three-dimensional object obtained after the rotation. However, voxels marked as visible in the first view may not be visible in the candidate view.
  • the statistics to be counted in this step S720 are the marked voxels that are still visible in the candidate view angle.
  • voxels that are not visible in the first viewing angle can also be marked.
  • the voxels determined by the depth map and the voxel cube in the original three-dimensional object are marked as invisible voxels in the first view.
  • the number of pixels in the projection image corresponding to the marked voxels the number of pixels visible in the first viewing angle of the candidate perspective three-dimensional object in the projection image can be obtained.
  • S730 Determine the original visible ratio according to the counted number of pixels and the total number of pixels of the candidate view three-dimensional object in the projection image. By calculating the ratio of the number of pixels counted in step S720 to the total number of pixels of the candidate view three-dimensional object in the projection image, the original visible ratio can be determined.
  • Fig. 8 shows the above-mentioned schematic diagram of determining the original visible scale.
  • V0 is the original three-dimensional object generated based on the three-dimensional reconstruction of step S110 and step S120.
  • the original three-dimensional object mainly includes three parts: voxels determined according to the depth map of the front view, voxels determined according to the depth map of the back view, and voxels determined according to the voxel cube. Among them, the voxels determined according to the depth map of the main view are considered to be visible in the first viewing angle, and the remaining voxels are considered to be invisible in the first viewing angle.
  • V0' is a candidate perspective three-dimensional object obtained after the original three-dimensional object is rotated based on the candidate perspective.
  • P0 is the projection image of the candidate perspective of the three-dimensional object under the candidate perspective.
  • P0 includes pixels corresponding to voxels that are visible in the first view in the candidate view three-dimensional object and pixels corresponding to voxels that are not visible in the first view. The two are marked with squares of different gray levels.
  • the original visible ratio can be determined based on the ratio between the former and the former plus the sum of the latter.
  • the projection map is used to determine the original visible scale, which is easy to implement, and the final three-dimensional reconstruction result is more accurate.
  • FIG. 9 shows a schematic flowchart of generating a supplementary two-dimensional image in step S140 according to a specific embodiment of the present invention, and this step S140 includes the following steps:
  • S141 Calculate the horizontal rotation angle and the vertical rotation angle between the camera pose of the first angle of view and the camera pose of the supplementary angle of view.
  • the camera poses of different viewing angles can be equivalent to the lateral rotation angle in the spherical coordinate system (the rotation angle relative to the X axis on the XOY plane). ) And the longitudinal rotation angle (the rotation angle relative to the Z axis on a plane perpendicular to XOY), represented by ( ⁇ , ⁇ ).
  • the horizontal and vertical rotation angles between the camera pose of the first angle of view and the camera pose of the supplementary angle of view The rotation angle can be expressed as ( ⁇ 2- ⁇ 1, ⁇ 2- ⁇ 1).
  • S142 Combine a vector composed of the horizontal corner and the vertical corner with each vector in the original image feature, and use all the combined vectors as supplementary image features.
  • H ⁇ W feature vectors can be extracted from each original two-dimensional image, and these H ⁇ W feature vectors constitute the original image features.
  • the horizontal rotation angle and the vertical rotation angle ( ⁇ 2- ⁇ 1, ⁇ 2- ⁇ 1) may be calculated in step S610, which are spliced to each feature vector, so that each spliced feature vector contains n+2 vectors.
  • the spliced feature vector is represented as (P1, P2,...Pn, ⁇ 2- ⁇ 1, ⁇ 2- ⁇ 1).
  • Each feature vector in the original image features is stitched, and all the feature vectors obtained after stitching are used as supplementary image features.
  • S143 Generate the supplementary two-dimensional image based on the supplementary image feature.
  • a decoder composed of a convolutional neural network can be used to generate a supplementary two-dimensional image corresponding to the supplementary image feature. It can be understood that the decoder can be obtained by training using sample features and corresponding sample images.
  • the complementary image feature is obtained by stitching the corner between the feature vector in the original image feature and the camera pose, and the complementary two-dimensional image is generated based on the complementary image feature, which is simple to operate and easy to implement.
  • Fig. 10 shows a schematic flow chart of generating a supplementary two-dimensional image according to another specific embodiment of the present invention. Specific steps are as follows:
  • the projection image of the original three-dimensional object under the supplementary perspective may be obtained similarly to the acquisition of the projection image of the candidate perspective three-dimensional object under the candidate perspective in the foregoing step S710.
  • the projection image of the original three-dimensional object under the supplementary perspective can be obtained directly based on the result of step S710.
  • the projection image of the original three-dimensional object under the supplementary viewing angle contains pixels corresponding to the voxels that are visible in the first viewing angle of the original three-dimensional object and pixels corresponding to the voxels that are not visible in the first viewing angle.
  • this step S141' may include the following steps: a) For pixels in the projection image corresponding to the voxels of the original three-dimensional object visible in the first viewing angle, the corresponding feature vector in the target feature can be determined according to the original image feature. Specifically, the corresponding feature vector in the original image feature can be used as the feature vector in the former target feature.
  • the corresponding feature vector in the target feature can be determined based on random noise. For example, use random noise as the corresponding feature vector in the target feature.
  • the random noise can take any value in the range [0,1].
  • the original image features correspondingly contain multiple features corresponding to each image with different viewing angles.
  • the corresponding feature vectors in all the original image features can be summed and then averaged, and the obtained average value is used as the pixel Target characteristics.
  • S142' Generate the supplementary two-dimensional image according to the target feature.
  • a decoder composed of a convolutional neural network can be used to generate a supplementary two-dimensional image corresponding to the target feature based on the target feature extracted in step S141'.
  • a person of ordinary skill in the art can understand the specific operation, and for the sake of brevity, it will not be repeated here.
  • Fig. 11 shows a schematic block diagram of generating a supplementary two-dimensional image according to a specific embodiment of the present invention.
  • V0 is the original three-dimensional object generated by the three-dimensional reconstruction
  • V0" is the supplementary perspective three-dimensional object obtained after the original three-dimensional object is rotated based on the supplementary perspective
  • P0' is the projection image of the supplementary perspective three-dimensional object under the supplementary perspective P0' may include pixels corresponding to voxels that are visible in the first viewing angle of the original three-dimensional object and pixels corresponding to voxels that are not visible in the first viewing angle.
  • the feature vectors of the pixels corresponding to the voxels that are visible in the first view of the original three-dimensional object and the voxels corresponding to the voxels that are not visible in the first view of the original three-dimensional object are extracted respectively.
  • the feature vector of the pixel to generate the target feature For the former, the corresponding feature vector can be derived from the original image features extracted from the original two-dimensional image; for the latter, the corresponding feature vector can be determined based on random noise.
  • step S141' further includes: concatenating P0' with the feature vector determined in step a) and step b) to generate the target feature.
  • P0' is a matrix of 1 ⁇ H ⁇ W (H represents the height of the original two-dimensional image, and W represents the width of the original two-dimensional image).
  • the original image feature is a C ⁇ H ⁇ W tensor as described above, and the feature vector determined in step a) and step b) also constitutes a C ⁇ H ⁇ W feature tensor.
  • the (C+1) ⁇ H ⁇ W tensor is the generated target feature.
  • P0' is used as the mask in the target feature, which will further improve the accuracy of the 3D reconstruction result.
  • the target feature can be decoded by a decoder composed of, for example, a convolutional neural network, so as to obtain a corresponding supplementary two-dimensional image.
  • a decoder composed of, for example, a convolutional neural network
  • the supplementary two-dimensional image generated in the above technical solution contains more information in the original two-dimensional image, and also contains sufficient supplementary information, so that the three-dimensional reconstruction result obtained based on it has a high degree of credibility.
  • step S130 to step S160 can be iterated multiple times, and the final 3D reconstruction result can be determined according to whether the iteration termination condition is satisfied.
  • Fig. 12 shows a schematic flowchart of a three-dimensional reconstruction method according to another embodiment of the present invention. As shown in FIG. 12, the three-dimensional reconstruction method includes the following steps:
  • S1210 Extract original image features from the original two-dimensional image of the target object.
  • S1220 Determine an original three-dimensional object based on the original image feature.
  • S1230 Determine the camera pose of the supplementary angle of view of the target object, wherein the supplementary angle of view is different from the first angle of view for generating the original two-dimensional image.
  • S1240 Based on the camera pose of the supplementary angle of view, generate a supplementary two-dimensional image of the target object in the supplementary angle of view.
  • S1250 Perform three-dimensional reconstruction on the supplementary two-dimensional image to generate a supplementary three-dimensional object corresponding to the supplementary two-dimensional image.
  • S1270 Determine whether the proportion of visible voxels in the three-dimensional reconstruction result is greater than the second proportion.
  • the percentage of visible voxels in the three-dimensional reconstruction result is the percentage of the number of voxels that are visible in the first perspective among the visible voxels of the three-dimensional reconstruction result in the supplementary perspective. For example, there are a total of m voxels that are visible in the supplementary viewing angle as a result of the three-dimensional reconstruction, and among these voxels that are simultaneously visible in the first viewing angle is M, the proportion of visible voxels is M/m. It can be understood that the proportion of visible voxels can reflect the credibility of the 3D reconstruction results.
  • the second ratio can be any value between 70% and 90%. In an example, the above-mentioned second ratio is 85%. This value takes into account the consumption of computing resources and the accuracy of the calculation results.
  • the ratio is not greater than the second ratio
  • three-dimensional reconstruction is performed again based on the camera pose of the new supplementary angle of view. If the proportion of visible voxels is not greater than the second proportion, it indicates that there is still a certain gap between the current 3D reconstruction result and the real target object. Therefore, it is necessary to perform 3D reconstruction again based on the camera pose of the new supplementary perspective.
  • step S1280 is executed.
  • the proportion of visible voxels is greater than the second proportion, it indicates that the three-dimensional object generated under the current view angle is relatively close to the real three-dimensional object, so the three-dimensional reconstruction result can be used as the final result.
  • a three-dimensional reconstruction device is also provided.
  • Fig. 13 shows a schematic block diagram of a three-dimensional reconstruction device according to an embodiment of the present invention.
  • the apparatus 1300 includes a feature extraction module 1310, a first reconstruction module 1320, a supplementary perspective module 1330, a supplementary image module 1340, a second reconstruction module 1350, and a fusion module 1360.
  • the various modules can respectively execute the various steps/functions of the three-dimensional reconstruction method described above.
  • the various modules can respectively execute the various steps/functions of the three-dimensional reconstruction method described above. In the following, only the main functions of the components of the device 1300 are described, and the details that have been described above are omitted.
  • the feature extraction module 1310 is used to extract original image features from the original two-dimensional image of the target object
  • the first reconstruction module 1320 is configured to determine the original three-dimensional object based on the original image feature
  • a supplementary perspective module 1330 configured to determine a camera pose of a supplementary perspective of the target object, wherein the supplementary perspective is different from the first perspective for generating the original two-dimensional image;
  • a supplementary image module 1340 configured to generate a supplementary two-dimensional image of the target object in the supplementary perspective based on the camera pose of the supplementary perspective;
  • the second reconstruction module 1350 is configured to perform three-dimensional reconstruction on the supplementary two-dimensional image to generate a supplementary three-dimensional object corresponding to the supplementary two-dimensional image;
  • the fusion module 1360 is used to fuse the original three-dimensional object and the supplementary three-dimensional object to obtain a three-dimensional reconstruction result of the target object.
  • a three-dimensional reconstruction system including: a processor and a memory, wherein computer program instructions are stored in the memory, and the computer program instructions are used for execution when the processor is running.
  • Fig. 14 shows a schematic block diagram of a three-dimensional reconstruction system according to an embodiment of the present invention.
  • the system 1400 includes an input device 1410, a storage device 1420, a processor 1430, and an output device 1440.
  • the input device 1410 is used for receiving operation instructions input by the user and collecting data.
  • the input device 1410 may include one or more of a keyboard, a mouse, a microphone, a touch screen, an image capture device, and the like.
  • the storage device 1420 stores computer program instructions for implementing the corresponding steps in the three-dimensional reconstruction method according to the embodiment of the present invention.
  • the processor 1430 is used to run the computer program instructions stored in the storage device 1420 to execute the corresponding steps of the three-dimensional reconstruction method according to the embodiment of the present invention, and is used to implement the three-dimensional reconstruction apparatus according to the embodiment of the present invention
  • the output device 1440 is used to output various information (such as images and/or sounds) to the outside (such as a user), and may include one or more of a display, a speaker, and the like.
  • the system 1400 when the computer program instructions are executed by the processor 1430, the system 1400 is caused to perform the following steps:
  • the original three-dimensional object and the supplementary three-dimensional object are fused to obtain a three-dimensional reconstruction result of the target object.
  • a storage medium on which program instructions are stored, and when the program instructions are run by a computer or a processor, the computer or the processor is caused to execute the present invention.
  • the corresponding steps of the above-mentioned three-dimensional reconstruction method of the embodiment are used to implement the corresponding module in the above-mentioned three-dimensional reconstruction device or the above-mentioned corresponding module used in the three-dimensional reconstruction system according to the embodiment of the present invention.
  • the storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), USB memory, or any combination of the above storage media.
  • the computer-readable storage medium may be any combination of one or more computer-readable storage media.
  • the computer or the processor executes the following steps:
  • the original three-dimensional object and the supplementary three-dimensional object are fused to obtain a three-dimensional reconstruction result of the target object.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented.
  • the various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some modules in the three-dimensional reconstruction apparatus according to the embodiments of the present invention.
  • DSP digital signal processor
  • the present invention can also be implemented as a device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for realizing the present invention may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明实施例提供了一种三维重建方法、装置、计算机系统及计算机可读存储介质,所述方法包括:从目标物体的原始二维图像中提取原始图像特征;基于所述原始图像特征确定原始三维物体;确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。上述方案能够获取目标物体的更多可信信息,提高三维物体的重建质量。

Description

三维重建方法、装置、系统和存储介质 技术领域
本发明涉及计算机视觉技术领域,更具体地涉及一种三维重建方法、装置、系统和存储介质。
背景技术
三维重建是基于已知的二维图像还原对应的三维物体的过程。由于二维图像仅包括在特定相机视角下采集的目标物体的信息,因此只能反映出目标物体在该特定相机视角下的可见部分。基于不同相机视角的二维图像越多,重建生成的三维物体相对于目标物体的还原度越高,重建质量就越好。
然而在实际情况下,基于有限视角的二维图像进行三维重建会由于不可避免的遮挡等问题使得重建具有多异性。期望使用更多视角的二维图像,以得到更好的重建效果。但由于目标物体所处的地理位置、周边环境遮挡等原因可能无法获取到期望视角下的二维图像。因此,难以获得满意的三维重建结果。
发明内容
考虑到上述问题而提出了本发明。
根据本发明一个方面,提供了一种三维重建方法。所述方法包括:
从目标物体的原始二维图像中提取原始图像特征;
基于所述原始图像特征确定原始三维物体;
确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体 的三维重建结果。
示例性地,所述基于所述原始图像特征确定原始三维物体包括:
对所述原始图像特征通过深度神经网络进行解码,以获得所述目标物体的深度图;
对所述原始图像特征通过体素神经网络进行解码,以获得所述目标物体的体素立方体;
基于所述深度图和所述体素立方体确定所述原始三维物体。
示例性地,所述基于所述深度图和所述体素立方体确定所述原始三维物体包括:
根据所述深度图确定所述原始三维物体中可见的体素;以及
根据所述体素立方体确定所述原始三维物体中的其他体素。
示例性地,所述目标物体的深度图包括所述目标物体的主视角的深度图和后视角的深度图。
示例性地,所述原始二维图像包含多张不同视角的图像,所述基于所述原始图像特征确定原始三维物体包括:
分别基于从每个视角的原始二维图像提取的对应的原始图像特征确定对应的分视角三维物体;以及
对所有的分视角三维物体进行融合,以获得所述原始三维物体。
示例性地,所述对所有的分视角三维物体进行融合以获得所述原始三维物体包括:
将每个分视角三维物体旋转到标准姿态,以获得对应的标准视角三维物体;以及
根据所有标准视角三维物体的体素,确定所述原始三维物体的体素。
示例性地,所述根据所有标准视角三维物体的体素,确定所述原始三维物体的体素包括:
对于所有标准视角三维物体所涉及的每个位置,当所有标准视角三维物体中在对应位置上存在体素的标准视角三维物体超过第一比例时,确定所述原始三维物体在该位置上存在体素。
示例性地,所述确定所述目标物体的补充视角的相机位姿包括:
获取预设的至少一个候选视角的相机位姿;
对于每个候选视角,
将所述原始三维物体旋转到该候选视角下,以获得对应的候选视角三维物体;
确定所述候选视角三维物体的可见体素的原始可见比例;
当所述原始可见比例在第一范围内时,确定该候选视角的相机位姿为所述补充视角的相机位姿。
示例性地,所述确定所述候选视角三维物体的可见体素的原始可见比例包括:
基于该候选视角,将所述候选视角三维物体进行投影,以获得投影图;
统计所述投影图中的所述候选视角三维物体的、在所述第一视角下可见的像素数;以及
根据所统计的像素数和所述投影图中的所述候选视角三维物体的总像素数,确定所述原始可见比例。
示例性地,所述基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像包括:
计算所述第一视角的相机位姿与所述补充视角的相机位姿之间的横向转角和纵向转角;
将所述横向转角和所述纵向转角组成的向量与所述原始图像特征中的每个向量拼接,以由拼接后的所有向量为补充图像特征;
基于所述补充图像特征生成所述补充二维图像。
示例性地,所述基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像包括:
根据所述原始三维物体在所述补充视角下的投影图以及所述原始图像特征,提取目标特征;以及
根据所述目标特征生成所述补充二维图像。
示例性地,所述根据所述原始三维物体在所述补充视角下的投影图以及所述原始图像特征提取目标特征包括:
对于所述投影图中的、与所述原始三维物体在所述第一视角下可见的体素对应的像素,根据所述原始图像特征确定所述目标特征中对应特征向量;
对于所述投影图中其他像素,基于随机噪声确定所述目标特征中对应特征向量。
示例性地,所述原始二维图像包含多张不同视角的图像,所述原始图像特 征包含与每张不同视角的图像相对应的多个特征,所述根据所述原始图像特征确定所述目标特征中对应特征向量包括:
对于所述投影图中的、与所述原始三维物体在所述第一视角下可见的体素对应的像素,将多个原始图像特征中的对应特征向量进行平均,以将平均值作为目标特征中的对应特征向量。
示例性地,所述根据所述原始三维物体在所述补充视角下的投影图以及所述原始图像特征提取目标特征还包括:
将所述投影图与所确定的特征向量进行拼接,以生成所述目标特征。
示例性地,在所述对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果后,还包括:
判断所述三维重建结果中可见的体素占比是否大于第二比例;
对于不大于第二比例的情况,将所述补充二维图像作为原始二维图像,并再次基于新的补充视角的相机位姿进行三维重建,直至所述三维重建结果中可见的体素占比大于第二比例。
根据本发明的另一方面,还提供了一种三维重建装置,包括:
特征提取模块,用于从目标物体的原始二维图像中提取原始图像特征;
第一重建模块,用于基于所述原始图像特征确定原始三维物体;
补充视角模块,用于确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
补充图像模块,用于基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
第二重建模块,用于对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
融合模块,用于对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
根据本发明再一方面,还提供了一种三维重建系统,包括:处理器和存储器,其中,所述存储器中存储有计算机程序指令,所述计算机程序指令被所述处理器运行时用于执行上述的三维重建方法。
根据本发明又一方面,还提供了一种存储介质,在所述存储介质上存储了程序指令,所述程序指令在运行时用于执行上述的三维重建方法。
根据本发明实施例的技术方案,通过基于原始二维图像增加目标物体在补 充视角下的二维图像,并基于该补充视角下的二维图像和原始二维图像进行三维重建,能够获取目标物体的更多可信信息,提高三维物体的重建质量。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过结合附图对本发明实施例进行更详细的描述,本发明的上述以及其它目的、特征和优势将变得更加明显。附图用来提供对本发明实施例的进一步理解,并且构成说明书的一部分,与本发明实施例一起用于解释本发明,并不构成对本发明的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1示出了根据本发明一个实施例的三维重建方法的示意性流程图;
图2示出了根据本发明一个实施例的世界坐标系和球面坐标系的转换关系;
图3示出了根据本发明一个实施例确定原始三维物体的示意性流程图;
图4A示出了根据本发明一个实施例通过多张原始二维图像确定原始三维物体的示意性流程图;
图4B示出了不同视角下的相机拍摄得到不同的原始二维图像的示意图;
图5A示出了根据本发明一个实施例对多个分视角三维物体进行融合的示意性流程图;
图5B示出了根据本发明一个实施例通过多张原始二维图像得到原始三维物体的示意性框图;
图6示出了根据本发明一个实施例确定补充视角的相机位姿的示意性流程图;
图7示出了根据本发明一个实施例确定原始可见比例的示意性流程图;
图8示出了根据本发明一个实施例确定原始可见比例的示意图;
图9示出了根据本发明一个实施例生成补充二维图像的示意性流程图;
图10示出了根据本发明另一个实施例生成补充二维图像的示意性流程图;
图11示出了根据本发明一个实施例生成补充二维图像的示意性框图;
图12示出了根据本发明一个实施例进行迭代重建的示意性流程图;
图13示出了根据本发明一个实施例的三维重建装置的示意性框图;
图14示出了根据本发明一个实施例的用于三维重建系统的示意性框图。
具体实施方式
为了使得本发明的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本发明的示例实施例。显然,所描述的实施例仅仅是本发明的一部分实施例,而不是本发明的全部实施例,应理解,本发明不受这里描述的示例实施例的限制。基于本发明中描述的本发明实施例,本领域技术人员在没有付出创造性劳动的情况下所得到的所有其它实施例都应落入本发明的保护范围之内。
在本文描述的三维重建方案中,在原始二维图像的基础上,生成与原有视角不同的补充视角下的二维图像,从而基于原有视角的二维图像和补充视角的二维图像共同进行三维重建,以得到还原度更高、重建质量更好的三维重建结果。
图1示出了根据本发明一个实施例的三维重建方法100的示意性流程图。如图1所示,所述方法100包括以下步骤。
S110:从目标物体的原始二维图像中提取原始图像特征。
原始二维图像可以是利用照相机或摄像机等成像设备直接采集的目标物体的图像。原始二维图像还可以是经预处理操作的图像。示例性地,可以对所采集的图像执行滤波等预处理操作,以获取质量更佳的原始二维图像。原始二维图像可以是在单一视角下得到的单张图像,也可以是在多个不同的视角下得到的多张图像。
示例性地,利用卷积神经网络(CNN)组成的编码器从目标物体的原始二维图像中提取原始图像特征。本领域普通技术人员可以理解,可以基于任何现有的或未来研发的提取图像特征的方法完成步骤S110,例如Harris角点检测算法、SIFT算法等。本申请对此不做限制。
原始图像特征可以包括多个特征向量。其中每个特征向量对应于原始二维图像中的相应像素点。以单张原始二维图像为例,可以自该原始二维图像中提取H×W个特征向量(H代表原始二维图像的高度,W代表原始二维图像的宽度)。每个特征向量的维度都为C。
S120:基于所述原始图像特征确定原始三维物体。
示例性地,利用卷积神经网络组成的解码器,基于原始图像特征生成原始三维物体。
可以理解,该原始三维物体与原始二维图像呈对应关系。该原始三维物体可以用以下方式来表示:点云(Point Cloud)、网格(Mesh)、体素(Voxel)、或深度图(Depth map)等。
在本发明一个具体示例中,通过体素表示原始三维物体。体素的表示方式是将目标物体所在空间看作是由多个立体方格组成的体素立方体,每个立体方格的取值表示物体在该方格所在的空间位置是否存在体素。例如取值为0代表物体在对应方格所在的空间位置上不存在体素,取值为1代表存在体素。
通过上述步骤S110和步骤S120,实现了基于目标物体的原始二维图像的三维重建。本领域普通技术人员可以理解,上述步骤S110和步骤S120中所述编码器和解码器仅用于示例,而不构成对本发明的限制。本领域普通技术人员可以利用任何现有的或未来研发的、基于已知二维图像进行三维重建的算法实现上述两个步骤。
S130:确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同。
可以理解,每个二维图像都存在一个对应的相机视角,该相机视角是相机采集该二维图像时的视角。相机视角由相机位姿来决定,可以通过相机位姿来表征相机视角。相机位姿是相机采集二维图像时的位置和姿态。可以基于各种坐标系来表示相机位姿。下面以球面坐标系为例来说明相机位姿。示例性地,可以将物体所在位置作为球面坐标系的原点,相机位姿可以用向量R和T表示。R=[α,β],其中α表示相机的方位角、β表示相机的仰角;T表示相机与物体之间的距离ρ。
本领域普通技术人员理解,世界坐标系和上述球面坐标系之间存在对应的转换关系。如已知某相机在世界坐标系中的坐标(x,y,z),其中x表示相机在X轴的坐标,y表示相机在Y轴的坐标,z表示相机在Z轴的坐标,可以对应地确定该相机在球面坐标系中的方位角α,仰角β和距离ρ。图2示出了世界坐标系和球面坐标系的转换关系。
可以将α=0,β=0,且ρ=1的相机位姿称为相机的标准位姿。可以将相机在该标准位姿下的视角称为标准视角。可以将三维物体对应于相机的标准位姿的姿态称为其标准姿态。可以在步骤S120确定原始三维物体时,将原始三 维物体变换到该标准姿态下。由此,不同的相机位姿可以表示为不同的方位角和仰角,即不同的向量[α,β]。
本领域普通技术人员理解,对于给定的原始二维图像,可以根据原始二维图像对应的相机参数确定生成该图像时的相机位姿。为描述简单,将原始二维图像的相机位姿对应的视角称为第一视角。在已知原始二维图像的第一视角的基础上,本步骤用于确定一个新的补充视角。该补充视角与第一视角不同。换言之,补充视角的相机位姿与第一视角的相机位姿不同。
示例性地,可以基于预设规则根据第一视角确定补充视角的相机位姿。例如,在第一视角的相机位姿基础上,以预设规则改变方位角和/或仰角。具体地,将第一视角的方位角加上预设度数,以获得补充视角。
S140:基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像。
在确定了补充视角的相机位姿的基础上,可以根据来自原始二维图像的原始图像信息生成目标物体在补充视角下的补充二维图像。该原始图像信息例如来自原始图像特征或者原始三维物体,甚至还可以来自原始二维图像本身。
生成补充二维图像的补充视角与生成原始二维图像的第一视角不同,使得补充二维图像与原始二维图像之间存在区别。因为目标物体表面一般是连续变化的,所以基于原始图像信息预测目标物体在第一视角下不可见的部分是存在可信度的。补充二维图像中包含了原始二维图像中不存在的信息,且该信息在一定程度上是可靠的。补充二维图像可以对原始图像信息起到补充丰富的作用。
S150:对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体。
本步骤与步骤S110和步骤S120执行的操作相似,只是步骤S110与步骤S120针对原始二维图像进行操作,而本步骤S150针对补充二维图像进行操作。示例性地,步骤S150可以包括:首先,利用卷积神经网络组成的编码器从补充二维图像中提取补充图像特征;然后,再利用卷积神经网络组成的解码器基于补充图像特征确定对应的补充三维物体。
在一个示例中,该补充三维物体通过体素的形式进行表示。可以理解,由于补充二维图像中包含了原始图像信息中不存在的信息,因此生成的补充三维物体中在补充视角下可见体素必然与原始三维物体中在第一视角下可见体素不 同。
S160:对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
示例性地,可以通过对原始三维物体和补充三维物体的体素取并集的方式确定目标物体的最终的三维重建结果。对于空间中任意位置,只要原始三维物体或者补充三维物体中的任一个在该位置存在体素,那么就确定三维重建结果在该位置存在体素。
替代地,还可以通过对原始三维物体和补充三维物体的体素取交集的方式确定目标物体的最终的三维重建结果。对于空间中任意位置,只有原始三维物体和补充三维物体二者在该位置都存在体素,那么才确定三维重建结果在该位置存在体素。
通过生成与原始二维图像的相机视角不同的补充二维图像,可以增加更多信息用于三维重建。从而,能够获得更理想的三维重建结果。
图3示出了根据本发明一个实施例的步骤S120确定原始三维物体的示意性流程图。如前所述,可以利用神经网络组成的解码器,基于原始图像特征生成原始三维物体。在本实施例中,卷积神经网络组成的解码器可以利用深度神经网络和体素神经网络实现。如图3所示,步骤S120包括以下步骤。
S121:对原始图像特征通过深度神经网络进行解码,以获得目标物体的深度图。
在一个示例中,深度神经网络可以包括多个2维(2D)卷积层。深度图中每个像素表示目标物体对应位置的深度。该深度可以是目标物体对应位置与相机之间的距离。
对于原始图像特征中的每一个特征向量,可以通过以下公式计算原始二维图像中的、与该特征向量对应的像素的深度d:
Figure PCTCN2019120394-appb-000001
其中i表示该特征向量中的元素,σ(F,dim=1)表示对该特征向量F沿着深度的方向进行softmax函数运算获得的i的概率值,C表示最大深度。例如假设特征向量F为8维向量[0,0,0,1,1,0,0,0],其中C=8,i=4和5。此时,d=σ(F,dim=1)×4+σ(F,dim=1)×5,若σ(F,dim=1)=0.5,那么d=0.5×4+0.5×5=4.5,即原始二维图像中与该特征向量对应的像素的深度为4.5。
可以称相机采集原始二维图像时的视角为主视角,即前述第一视角。可以基于原始图像特征生成主视角的深度图。
另外,基于原始图像特征生成的深度图还可以包含后视角的深度图。后视角是与主视角成180度的视角。示例性地,可以认为目标物体关于垂直于主视角方向的平面对称。据此,虽然实际上目标物体自后视角可见的部分在主视角下是不可见的,但是可以根据原始图像特征获得后视角的深度图。
S122:对所述原始图像特征通过体素神经网络进行解码,以获得所述目标物体的体素立方体。
体素神经网络也可以包括多个2D卷积层,其用于根据原始图像特征输出由多个立体方格组成的体素立方体。在体素立方体中,如果立体方格的取值为1,则目标物体在该方格所在的空间位置存在体素。如果立体方格的取值为0,则目标物体在该方格所在的空间位置不存在体素。
S123:基于所述深度图和所述体素立方体确定所述原始三维物体。
根据前文所述,深度图可以包含主视角的深度图和后视角的深度图。其中,主视角的深度图包括目标物体的前表面的三维信息,后视角的深度图包括目标物体的后表面的三维信息。可以根据前表面的三维信息和后表面的三维信息确定目标物体的三维信息。示例性地,可以认为前表面和后表面之间的部分为根据深度图重建的目标物体。可以将根据主视角的深度图获得的前表面的各个点与其根据后视角的深度图获得的后表面的对应点相连接,则前表面、后表面以及所有连接线所封闭的空间即为根据深度图重建的目标物体所占用的空间。
可以融合基于根据深度图重建的目标物体与根据原始图像特征获得的体素立方体,以确定原始三维物体。在一个示例中,对于上述二者都认为某特定位置是目标物体的情况,将确定该位置存在目标物体。
通过深度图和体素立方体确定原始三维物体,可以有效地利用原始二维图像中的信息,使得生成的原始三维物体更加接近目标物体。
在一个具体示例中,上述步骤S123可以包括:首先,根据深度图确定原始三维物体中可见的体素;然后,根据体素立方体确定原始三维物体中的其他体素。
如前所述,深度图可以包括主视角的深度图。由于主视角的深度图是直接基于原始二维图像获取的,因此根据主视角的深度图确定的体素可以认为是可 见体素。这些体素的可信度较高,更能反映目标物体的实际形状。示例性地,深度图还可以包括后视角的深度图。鉴于大部分物体都是前后对称关系,可以认为根据后视角的深度图确定的体素也是可见的。可以根据主视角的深度图和后视角的深度图确定所述原始三维物体在主视角下可见的体素和在后视角下可见的体素。可以理解,虽然体素立方体中也包含前表面和后表面上的体素,但是根据深度图确定这些可见体素比根据体素立方体确定这些体素准确性更高。
然而,主视角地深度图和后视角地深度图无法体现出目标物体的其它空间特征。原始三维物体中的其它体素是原始二维图像中不可见的。可以基于体素神经网络生成的体素立方体来确定这些体素。体素立方体中包含了除前表面(主视角下可见)和后表面(后视角下可见)之外的其他表面上的体素,这些体素可以用于确定原始三维物体除前表面和后表面之外的其它表面的体素。
根据深度图确定原始三维物体中可见的体素并根据体素立方体确定其他体素,可以得到可信度更高、准确性更强的原始三维物体。如前文所述,原始二维图像可以包括多个不同的视角下得到的多张图像。图4A示出了根据本发明一个实施例通过多张原始二维图像确定原始三维物体的示意性流程图。如图4A所示,当原始二维图像包含多张不同视角的图像时,步骤S120确定原始三维物体可以包括以下步骤。
首先,分别基于从每个视角的原始二维图像提取的对应的原始图像特征确定对应的分视角三维物体。
如前文所述,目标物体的每张原始二维图像都分别与相机在拍摄该原始二维图像时的视角相对应。图4B示出了根据本发明一个实施例的不同视角下的相机拍摄得到不同的原始二维图像的示意图。如图4B所示,C1、C2、C3表示处于不同位姿的相机。分别利用C1、C2、C3对处于标准姿态的目标物体进行拍摄,可以得到各自视角对应的原始二维图像I1、I2和I3。
对于每个视角下的原始二维图像I1、I2和I3,都可以通过三维重建获得与原始二维图像的视角对应的三维物体,在此称之为分视角三维物体。可以理解,每个分视角三维物体对应的原始二维图像不同,因此其包含的体素也可能不同。
然后,对所有的分视角三维物体进行融合,以获得所述原始三维物体。示例性地,本步骤中根据多个分视角三维物体中包含的体素确定原始三维物体。可以利用任何现有技术或未来研发的算法对各个分视角三维物体进行融合,本 申请对此不做限制。
上述实施例中,基于多张不同视角的图像确定原始三维物体。这些图像含有更多可信的目标物体的信息。由此,能够使得本申请的三维重建结果更准确。
图5A示出了根据本发明一个实施例对所有的分视角三维物体进行融合的示意性流程图,如图5A所示,对多个分视角三维物体进行融合包括以下步骤。
S510:将每个分视角三维物体旋转到标准姿态,以获得对应的标准视角三维物体。
每个分视角三维物体都是基于各自对应的原始二维图像生成的,其分别对应于各自的视角。为了方便对多个分视角三维物体进行融合,可以先将每个分视角三维物体旋转到统一的标准姿态下。由此,可以得到每个分视角三维物体在同样的标准视角下的空间形状,即标准视角三维物体。
S520:根据所有标准视角三维物体的体素,确定所述原始三维物体的体素。
对于所有标准视角三维物体所涉及的每个位置,根据所有标准视角三维物体在对应位置上是否存在体素,确定所述原始三维物体在该位置上是否存在体素。示例性地,可以根据所有标准视角三维物体的体素的并集或交集确定原始三维物体的体素。
图5B示出了根据本发明一个实施例通过多张原始二维图像确定原始三维物体的示意性框图。通过分别对不同视角下的原始二维图像I1、I2和I3进行三维重建,得到各自对应的分视角三维物体V1、V2和V3。分别将V1、V2和V3旋转到标准姿态,以得到各自对应的标准视角三维物体V1’、V2’和V3’。最后,对标准视角三维物体V1’、V2’和V3’进行融合,得到原始三维物体V0。可以理解,在图5所示的确定原始三维物体的过程中,忽略了从原始二维图像提取原始图像特征的过程,但本领域普通技术人员通过上面描述,能够理解该过程。
在上述技术方案中,首先将每个分视角三维物体旋转到标准姿态,然后对旋转后的标准视角三维物体进行融合,不仅实现容易,而且还保证了结果准确性。
在一个具体实施例中,标准视角三维物体是以体素的方式表示的。根据立 体方格的取值是1或0,可以确定在该立体方格的对应位置是否存在体素。当所有标准视角三维物体中在某位置上存在体素的标准视角三维物体超过第一比例时,确定所述原始三维物体在该位置上存在体素。
例如,假设有k个标准视角三维物体,对于空间的某位置,其中有m个标准视角三维物体存在体素(该位置的立体方格的取值为1),那么当m/k超过第一比例时,确定原始三维物体在该位置处存在体素。在一个示例中,该第一比例为0.5。
上述过程可以用投票函数实现,具体公式如下。
Figure PCTCN2019120394-appb-000002
如果
Figure PCTCN2019120394-appb-000003
则O(x,y,z)=1,
否则O(x,y,z)=0。
其中,(x,y,z)表示空间中的某位置的坐标,k表示标准视角三维物体的个数,Pi(x,y,z)表示第i个标准视角三维物体在该位置处的立体方格的取值,O(x,y,z)表示原始三维物体在该位置处的立体方格的取值。
上述技术方案中,根据所有标准视角三维物体中在某位置处存在体素的个数来确定原始三维物体。该原始三维物体更接近于真实的目标物体。由此,该技术方案获得的三维重建结果更理想。
根据前文所述,在步骤S120确定原始三维物体之后,需要步骤S130进一步确定目标物体的补充视角的相机位姿。图6示出了根据本发明一个实施例确定补充视角的相机位姿的示意性流程图。如图6所示,步骤S130确定补充视角的相机位姿包括以下步骤。
S131:获取预设的至少一个候选视角的相机位姿。
每个候选视角的相机位姿可以表示为球面坐标系中的方位角和仰角,用向量(α,β)表示。示例性地,在将目标物体的中心点作为坐标系原点的基础上,选取方位角α为集合[0,45,90,135,180,225,270,315]中的元素,仰角β为集合[-60,-30,0,30,60]中的元素,距离为1的相机位姿。可以理解,在该示例中,共选取40个相机位姿。
S132:对于每个候选视角的相机位姿,将所述原始三维物体旋转到该候选视角下,以获得对应的候选视角三维物体。
具体地,可以将原始三维物体从当前视角下旋转到候选视角下。可以理 解,原始三维物体的当前视角可以是原始二维图像所对应的第一视角。特别是对于原始二维图像只有单张的情况,可以直接基于第一视角确定原始三维物体,运算更简单。替代地,原始三维物体的当前视角还可以是标准视角。根据前述示例,对于原始二维图像存在多张不同视角的图像的情况,所获得的原始三维物体可能是处于标准视角的。
例如,假设当前视角的相机位姿为(α1,β1),候选视角的相机位姿为(α2,β2),那么可以将原始三维物体旋转(α2-α1,β2-β1)的角度,以得到候选视角三维物体。
S133:对于每个候选视角的相机位姿,确定候选视角三维物体的可见体素的原始可见比例。
候选视角三维物体的可见体素是指在候选视角三维物体在候选视角下的可见体素。在不同的视角下,三维物体的可见体素是不同的。以汽车为例,假设原始二维图像对应的第一视角(0,0)是正对车头的视角,那么构成车头部分的体素在第一视角下是可见体素,例如构成前车灯的体素、构成雨刷器的体素、构成引擎盖的体素等是可见体素。当将该汽车旋转到候选视角下时,例如左视角(90,0)下,那么构成左车门的体素是可见体素,而构成雨刷器的体素则不是可见体素。
原始可见比例是候选视角三维物体的可见体素中在第一视角下可见的体素的个数占比。可以理解,如果原始二维图像包括多张不同视角的图像,那么第一视角包括多个视角。可以理解,三维物体在候选视角下可见的体素,在第一视角下可能是可见的,也可能是不可见的。在前述汽车的示例中,汽车在左视角下的可见体素中,靠近车头的部分的体素在车头视角下是可见的,而靠近车尾的部分的体素在车头视角下是不可见的。由此,该示例中,左视角下汽车的可见体素的原始可见比例是左视角下可见体素中在第一视角下可见的像素的比例。
S134:对于每个候选视角的相机位姿,当原始可见比例在第一范围内时,确定该候选视角的相机位姿为补充视角的相机位姿。
原始可见比例可以反映出候选视角三维物体的可信程度。原始三维物体是基于原始二维图像生成的。原始二维图像中可见的像素能够真实反映目标物体的形状,因此是可信的像素。基于原始二维图像中的像素确定的原始三维物体中第一视角下可见的体素也是可信的。原始三维物体中除了第一视角下可见的 体素之外的其余体素的可信程度比第一视角下可见的体素的可信程度低。基于上述原因,本领域普通技术人员可以理解,候选视角三维物体的可见体素的原始可见比例越高,说明候选视角三维物体的可信程度越高;否则,说明候选视角三维物体的可信程度越低。
本步骤的目的在于,选择原始可见比例在合适范围内的候选视角作为三维重建时的补充视角。补充视角下的三维物体的可信程度不宜过低,否则在该视角下进行三维重建没有意义;同时补充视角下的三维物体的可信程度也不宜过高,否则会与第一视角太接近而起不到补充信息的作用。在本发明一个示例中,第一范围是50%-85%,原始可见比例在该范围内的候选视角作为三维重建的补充视角,该候选视角下的相机位姿为补充视角的相机位姿。该范围即保证了补充视角下的三维物体的可信度足够高,而且还保证了补充信息的有效量。
在上述实施例中,根据候选视角三维物体的可见体素的原始可见比例来确定补充视角的相机位姿,基于该补充视角的相机位姿获得的三维重建结果更准确。
根据前文所述,原始可见比例是确定补充视角的重点考虑因素。图7示出了根据本发明一个具体实施例的确定原始可见比例的示意性流程图。如图7所示,确定候选视角三维物体的可见体素的原始可见比例包括以下步骤。
S710:基于该候选视角,将候选视角三维物体进行投影,以获得投影图。
由于候选视角三维物体已经旋转到正对候选视角的位置,因此对候选视角三维物体进行候选视角方向的投影即可获得候选视角三维物体在候选视角下可见的体素。投影图中的候选视角三维物体的像素分别对应于其在候选视角下可见的体素。
在一个示例中,可以基于候选视角三维物体在候选视角下距离投影平面最近的体素来确定投影图。其中投影平面可以是相机所在的垂直于候选视角的平面。假设候选视角是X轴的方向,可以通过以下公式确定候选视角三维物体在候选视角下距离投影平面最近的体素:
d(y,z)=argmin(P(:,y,z)),其中P(:,y,z)>0
其中P(:,y,z)表示候选视角三维物体的Y轴坐标为y,Z轴坐标为z的平行于X轴的直线上的所有体素。当候选视角三维物体在某位置(x,y,z)存在体素时,P(x,y,z)=1;否则,P(x,y,z)=0。在限制了 P(:,y,z)>0的情况下,argmin(P(:,y,z))表示候选视角三维物体的、在前述直线上的体素与投影平面距离的最小值。根据上式,假设存在P(:,y,z)>0的m个体素,且中m个体素的X轴坐标分别为{x1,x2,…,xm},则d(y,z)取这些X轴坐标的最小值,即等于min{x1,x2,…,xm}。由此,该直线上存在候选视角三维物体的投影。否则,假设不存在P(:,y,z)>0的体素,则d(y,z)=0。由此,该直线上不存在候选视角三维物体的投影。综上,可以获得候选视角三维物体在候选视角下的投影图。
S720:统计投影图中的候选视角三维物体的、在第一视角下可见的像素数。
可以理解,投影图中的像素对应于候选视角三维物体中在候选视角下可见的体素。如前所述,候选视角三维物体在候选视角下可见的体素在原始二维图像的第一视角下可能是可见的,也可能是不可见的。本步骤S720用于确定投影图中的、与在第一视角下可见,同时也在候选视角下可见的体素对应的像素的个数。
具体地,可以对第一视角下可见的体素进行标记。在一个示例中,第一视角下可见的体素可以是原始三维物体中由主视角深度图确定的体素。在对原始三维物体中的体素进行标记的基础上,经过旋转后得到的候选视角三维物体中仍然保留这些标记。然而在第一视角下被标记为可见的体素,在候选视角下未必可见。本步骤S720中要统计的就是在候选视角下仍然可见的、被标记过的体素。
在另一个示例中,还可以对第一视角下不可见的体素进行标记。例如,将原始三维物体中由后视角的深度图和体素立方体确定的体素作为第一视角下不可见的体素进行标记。
根据投影图中的、与所标记的体素对应的像素数即可获得投影图中的候选视角三维物体的在第一视角下可见的像素数。
S730:根据所统计的像素数和投影图中的候选视角三维物体的总像素数,确定所述原始可见比例。计算步骤S720中所统计的像素数占投影图中的候选视角三维物体的总像素数的比例,即可确定原始可见比例。
图8示出了上述确定原始可见比例的示意图。V0为基于步骤S110和步骤S120的三维重建生成的原始三维物体。原始三维物体主要包括三部分:根据主视图的深度图确定的体素,根据后视角的深度图确定的体素和根据体素立方 体确定的体素。其中,认为根据主视图的深度图确定的体素是在第一视角下可见的,认为其余体素是在第一视角下不可见的。V0’是原始三维物体基于候选视角旋转后得到的候选视角三维物体。P0是候选视角三维物体在候选视角下的投影图。P0中包含与候选视角三维物体中在第一视角下可见的体素相对应的像素和与其中在第一视角下不可见的体素相对应的像素。这二者分别用不同灰度的方格进行标识。可以根据前者与前者加后者的和之间的比值,确定原始可见比例。
上述技术方案中,利用投影图来确定原始可见比例,易于实现,而且最终的三维重建结果更准确。
根据前文所述,在确定补充视角的相机位姿之后,生成目标物体在所述补充视角下的补充二维图像。图9示出了根据本发明一个具体实施例步骤S140生成补充二维图像的示意性流程图,该步骤S140包括以下步骤:
S141:计算第一视角的相机位姿与补充视角的相机位姿之间的横向转角和纵向转角。
如前所述,在将目标物体的中心点作为世界坐标系的原点的基础上,不同视角的相机位姿可以等效为球面坐标系中的横向转角(在XOY平面上相对于X轴的转角)和纵向转角(在垂直于XOY的平面上相对于Z轴的转角),用(α,β)表示。假设第一视角的相机位姿为(α1,β1),补充视角的相机位姿为(α2,β2),那么第一视角的相机位姿与补充视角的相机位姿之间的横向转角和纵向转角可以表示为(α2-α1,β2-β1)。
S142:将所述横向转角和所述纵向转角组成的向量与所述原始图像特征中的每个向量拼接,将拼接后的所有向量作为补充图像特征。
如前所述,可以从每张原始二维图像中提取到H×W个特征向量,这H×W个特征向量构成了原始图像特征。
假设特征向量的维度为n。可以将步骤S610计算得到横向转角和纵向转角(α2-α1,β2-β1),拼接到每个特征向量后,使得每个拼接后的特征向量包含n+2个向量。例如,原始图像特征中的其中一个特征向量表示为(P1,P2,……Pn),那么拼接后的特征向量则表示为(P1,P2,……Pn,α2-α1,β2-β1)。将原始图像特征中的每个特征向量都进行拼接,将拼接后得到的所有特征向量作为补充图像特征。
S143:基于所述补充图像特征生成所述补充二维图像。
可以基于该补充图像特征,利用卷积神经网络组成的解码器生成与补充图像特征对应的补充二维图像。可以理解该解码器可以通过利用样本特征和对应的样本图像训练获得。
通过原始图像特征中的特征向量与相机位姿之间的转角进行拼接的方式得到补充图像特征,并基于补充图像特征生成补充二维图像,操作简便,易于实现。
图10示出了根据本发明另一个具体实施例生成补充二维图像的示意性流程图。具体步骤如下:
S141’:根据原始三维物体在补充视角下的投影图以及原始图像特征,提取目标特征。
可以与前文步骤S710所述获取候选视角三维物体在候选视角下的投影图类似地,获取原始三维物体在补充视角下的投影图。
可以理解,在前述基于候选视角选择补充视角的示例中,此处可以直接基于步骤S710的结果获得原始三维物体在补充视角下的投影图。
原始三维物体在补充视角下的投影图中包含与原始三维物体在第一视角下可见的体素对应的像素和与其在第一视角下不可见的体素对应的像素。可以理解,前者的信息来自于原始二维图像,因此在自原始二维图像提取的原始图像特征中存在与其对应的特征向量。因此,此步骤S141’可以包括以下步骤:a)对于投影图中的、与原始三维物体在第一视角下可见的体素对应的像素,可以根据原始图像特征确定目标特征中的对应特征向量。具体地,可以将原始图像特征中对应的特征向量作为前者的目标特征中的特征向量。b)对于投影图中的、与原始三维物体在第一视角下不可见的体素对应的像素,可以基于随机噪声确定目标特征中对应特征向量。例如,将随机噪声作为目标特征中对应特征向量。可选地,该随机噪声可以取区间[0,1]范围内的任意值。
进一步,在原始二维图像中包含多张不同视角的图像的情况中,原始图像特征对应地包含与每张不同视角的图像对应的多个特征。对于投影图中的、与原始三维物体在第一视角下可见的体素对应的像素,可以将所有原始图像特征中的对应的特征向量求和后再平均,以将得到的平均值作为该像素的目标特征。
S142’:根据所述目标特征生成所述补充二维图像。
示例性地,可以利用卷积神经网络组成的解码器,基于步骤S141’提取 的目标特征生成与目标特征相对应的补充二维图像。本领域普通技术人员可以理解该具体操作,为了简洁,在此不再赘述。
图11示出了根据本发明一个具体实施例生成补充二维图像的示意性框图。如图11所示,V0为三维重建生成的原始三维物体,V0”是原始三维物体基于补充视角旋转后得到的补充视角三维物体,P0’是补充视角三维物体在所述补充视角下的投影图。P0’中可以包含与原始三维物体在第一视角下可见的体素对应的像素和与在第一视角下不可见的体素对应的像素。
在一个示例中,在P0’的基础上,分别提取与原始三维物体在第一视角下可见的体素对应的像素的特征向量和与原始三维物体在第一视角下不可见的体素对应的像素的特征向量,以生成目标特征。其中对于前者,其对应的特征向量可以来自从原始二维图像中提取到的原始图像特征;对于后者,其对应的特征向量可以是基于随机噪声确定的。
在另一个示例中,步骤S141’还包括:将P0’与步骤a)和步骤b)确定的特征向量进行拼接,以生成目标特征。具体的,P0’为1×H×W(H代表原始二维图像的高度,W代表原始二维图像的宽度)的矩阵。原始图像特征如前所述为C×H×W的张量,则步骤a)和步骤b)确定的特征向量也构成一个C×H×W的特征张量。将P0’与特征张量合并以生成(C+1)×H×W的张量。该(C+1)×H×W的张量即为所生成的目标特征。
在此示例中,P0’作为目标特征中的掩码,将进一步提高三维重建结果的准确性。
在获得目标特征的基础上,可以通过例如卷积神经网络组成的解码器对目标特征解码,从而得到对应的补充二维图像。
在上述技术方案中生成的补充二维图像即包含原始二维图像中的较多信息,又包含足够的补充信息,从而基于其获得的三维重建结果具有较高的可信度。
可以理解,选择的补充视角越多,生成的补充三维物体就越多,从而三维重建结果越接近目标物体的真实形状。因此,可以对步骤S130至步骤S160的过程进行多次迭代,并根据是否满足迭代终止条件来确定最终三维重建结果。
图12示出了根据本发明另一个实施例三维重建方法的示意性流程图。如图12所示,该三维重建方法包括以下步骤:
S1210:从目标物体的原始二维图像中提取原始图像特征。
S1220:基于所述原始图像特征确定原始三维物体。
S1230:确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同。
S1240:基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像。
S1250:对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体。
S1260:对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。以上步骤与步骤S110-S160类似,本文不再赘述。
S1270:判断所述三维重建结果中可见的体素占比是否大于第二比例。
三维重建结果中可见的体素占比是三维重建结果在补充视角下的可见体素中在第一视角下可见的体素的个数占比。例如三维重建结果在补充视角下可见的体素共有m个,其中这些体素中同时在第一视角下可见的个数为M个,则可见的体素占比为M/m。可以理解,可见的体素占比能够反映三维重建结果的可信程度。第二比例可以是70%至90%之间的任意值。在一个示例中,上述第二比例为85%。该数值兼顾了计算资源的消耗和计算结果的准确性。
对于不大于第二比例的情况,将所述补充二维图像作为原始二维图像,并转步骤S1230。由此,再次基于新的补充视角的相机位姿进行三维重建。若可见的体素占比不大于第二比例,说明当前的三维重建结果与真实的目标物体还存在一定差距,因此需要再次基于新的补充视角的相机位姿进行三维重建。
对于大于所述第二比例的情况,执行步骤S1280。
S1280:将所述三维重建结果作为最终结果。三维重建方法结束。
若可见的体素占比大于第二比例,说明当前视角下生成的三维物体与真实的三维物体已经比较接近,因此可以将三维重建结果作为最终结果。
通过上述步骤,可以保证通过有限次的迭代之后,得到的三维重建结果是符合预期的结果,保证重建三维物体的质量。
根据本发明另一方面,还提供了一种三维重建装置。图13示出了根据本发明一个实施例的三维重建装置的示意性框图。
如图13所示,装置1300包括特征提取模块1310、第一重建模块1320、补充视角模块1330、补充图像模块1340、第二重建模块1350和融合模块 1360。
所述各个模块可分别执行上文中所述的三维重建方法的各个步骤/功能。以下仅对该装置1300的各部件的主要功能进行描述,而省略以上已经描述过的细节内容。
特征提取模块1310,用于从目标物体的原始二维图像中提取原始图像特征;
第一重建模块1320,用于基于所述原始图像特征确定原始三维物体;
补充视角模块1330,用于确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
补充图像模块1340,用于基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
第二重建模块1350,用于对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
融合模块1360,用于对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
根据本发明再一方面,还提供了一种三维重建系统,包括:处理器和存储器,其中,所述存储器中存储有计算机程序指令,所述计算机程序指令被所述处理器运行时用于执行上述的三维重建方法。
图14示出了根据本发明一个实施例的用于三维重建系统的示意性框图。如图14所示,系统1400包括输入装置1410、存储装置1420、处理器1430以及输出装置1440。
所述输入装置1410用于接收用户所输入的操作指令以及采集数据。输入装置1410可以包括键盘、鼠标、麦克风、触摸屏和图像采集装置等中的一个或多个。
所述存储装置1420存储用于实现根据本发明实施例的三维重建方法中的相应步骤的计算机程序指令。
所述处理器1430用于运行所述存储装置1420中存储的计算机程序指令,以执行根据本发明实施例的三维重建方法的相应步骤,并且用于实现根据本发明实施例的用于三维重建装置中的特征提取模块1310、第一重建模块1320、补充视角模块1330、补充图像模块1340、第二重建模块1350和融合模块1360。
所述输出装置1440用于向外部(例如用户)输出各种信息(例如图像和/或声音),并且可以包括显示器、扬声器等中的一个或多个。
在一个实施例中,在所述计算机程序指令被所述处理器1430运行时使所述系统1400执行以下步骤:
从目标物体的原始二维图像中提取原始图像特征;
基于所述原始图像特征确定原始三维物体;
确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
此外,根据本发明又一方面,还提供了一种存储介质,在所述存储介质上存储了程序指令,在所述程序指令被计算机或处理器运行时使得所述计算机或处理器执行本发明实施例的上述三维重建方法的相应步骤,并且用于实现根据本发明实施例的上述三维重建装置中的相应模块或上述用于三维重建系统中的相应模块。所述存储介质例如可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、或者上述存储介质的任意组合。所述计算机可读存储介质可以是一个或多个计算机可读存储介质的任意组合。
在一个实施例中,所述计算机程序指令被计算机或处理器运行时,使得所述计算机或处理器执行以下步骤:
从目标物体的原始二维图像中提取原始图像特征;
基于所述原始图像特征确定原始三维物体;
确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
尽管这里已经参考附图描述了示例实施例,应理解上述示例实施例仅仅是示例性的,并且不意图将本发明的范围限制于此。本领域普通技术人员可以在其中进行各种改变和修改,而不偏离本发明的范围和精神。所有这些改变和修改意在被包括在所附权利要求所要求的本发明的范围之内。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个设备,或一些特征可以忽略,或不执行。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本发明并帮助理解各个发明方面中的一个或多个,在对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该本发明的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如相应的权利要求书所反映的那样,其发明点在于可以用少于某个公开的单个实施例的所有特征的特征来解决相应的技术问题。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域的技术人员可以理解,除了特征之间相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及 如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的三维重建装置中的一些模块的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
以上所述,仅为本发明的具体实施方式或对具体实施方式的说明,本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。本发明的保护范围应以权利要求的保护范围为准。

Claims (18)

  1. 一种三维重建方法,其特征在于,包括:
    从目标物体的原始二维图像中提取原始图像特征;
    基于所述原始图像特征确定原始三维物体;
    确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
    基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
    对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
    对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
  2. 根据权利要求1所述的三维重建方法,其特征在于,所述基于所述原始图像特征确定原始三维物体包括:
    对所述原始图像特征通过深度神经网络进行解码,以获得所述目标物体的深度图;
    对所述原始图像特征通过体素神经网络进行解码,以获得所述目标物体的体素立方体;
    基于所述深度图和所述体素立方体确定所述原始三维物体。
  3. 根据权利要求2所述的三维重建方法,其特征在于,所述基于所述深度图和所述体素立方体确定所述原始三维物体包括:
    根据所述深度图确定所述原始三维物体中可见的体素;以及
    根据所述体素立方体确定所述原始三维物体中的其他体素。
  4. 根据权利要求3所述的三维重建方法,其特征在于,所述目标物体的深度图包括所述目标物体的主视角的深度图和后视角的深度图。
  5. 根据权利要求1所述的三维重建方法,其特征在于,所述原始二维图像包含多张不同视角的图像,所述基于所述原始图像特征确定原始三维物体包括:
    分别基于从每个视角的原始二维图像提取的对应的原始图像特征确定对应的分视角三维物体;以及
    对所有的分视角三维物体进行融合,以获得所述原始三维物体。
  6. 根据权利要求5所述的三维重建方法,其特征在于,所述对所有的分视角三维物体进行融合以获得所述原始三维物体包括:
    将每个分视角三维物体旋转到标准姿态,以获得对应的标准视角三维物体;以及
    根据所有标准视角三维物体的体素,确定所述原始三维物体的体素。
  7. 根据权利要求6所述的三维重建方法,其特征在于,所述根据所有标准视角三维物体的体素,确定所述原始三维物体的体素包括:
    对于所有标准视角三维物体所涉及的每个位置,当所有标准视角三维物体中在对应位置上存在体素的标准视角三维物体超过第一比例时,确定所述原始三维物体在该位置上存在体素。
  8. 根据权利要求1所述的三维重建方法,其特征在于,所述确定所述目标物体的补充视角的相机位姿包括:
    获取预设的至少一个候选视角的相机位姿;
    对于每个候选视角的相机位姿,
    将所述原始三维物体旋转到该候选视角下,以获得对应的候选视角三维物体;
    确定所述候选视角三维物体的可见体素的原始可见比例;
    当所述原始可见比例在第一范围内时,确定该候选视角的相机位姿为所述补充视角的相机位姿。
  9. 根据权利要求8所述的三维重建方法,其特征在于,所述确定所述候选视角三维物体的可见体素的原始可见比例包括:
    基于该候选视角,将所述候选视角三维物体进行投影,以获得投影图;
    统计所述投影图中的所述候选视角三维物体的、在所述第一视角下可见的像素数;以及
    根据所统计的像素数和所述投影图中的所述候选视角三维物体的总像素数,确定所述原始可见比例。
  10. 根据权利要求1所述的三维重建方法,其特征在于,所述基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像包括:
    计算所述第一视角的相机位姿与所述补充视角的相机位姿之间的横向转角 和纵向转角;
    将所述横向转角和所述纵向转角组成的向量与所述原始图像特征中的每个向量拼接,以由拼接后的所有向量构成补充图像特征;
    基于所述补充图像特征生成所述补充二维图像。
  11. 根据权利要求1所述的三维重建方法,其特征在于,所述基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像包括:
    根据所述原始三维物体在所述补充视角下的投影图以及所述原始图像特征,提取目标特征;以及
    根据所述目标特征生成所述补充二维图像。
  12. 根据权利要求11所述的三维重建方法,其特征在于,所述根据所述原始三维物体在所述补充视角下的投影图以及所述原始图像特征提取目标特征包括:
    对于所述投影图中的、与所述原始三维物体在所述第一视角下可见的体素对应的像素,根据所述原始图像特征确定所述目标特征中对应特征向量;
    对于所述投影图中其他像素,基于随机噪声确定所述目标特征中对应特征向量。
  13. 根据权利要求12所述的三维重建方法,其特征在于,
    所述原始二维图像包含多张不同视角的图像,
    所述原始图像特征包含与每张不同视角的图像相对应的多个特征,
    所述根据所述原始图像特征确定所述目标特征中对应特征向量包括:
    对于所述投影图中的、与所述原始三维物体在所述第一视角下可见的体素对应的像素,将多个原始图像特征中的对应特征向量进行平均,以将平均值作为目标特征中的对应特征向量。
  14. 根据权利要求12所述的三维重建方法,其特征在于,所述根据所述原始三维物体在所述补充视角下的投影图以及所述原始图像特征提取目标特征还包括:
    将所述投影图与所确定的特征向量进行拼接,以生成所述目标特征。
  15. 根据权利要求1所述的三维重建方法,其特征在于,在所述对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果后,还包括:
    判断所述三维重建结果中可见的体素占比是否大于第二比例;
    对于不大于第二比例的情况,将所述补充二维图像作为原始二维图像,并再次基于新的补充视角的相机位姿进行三维重建,直至三维重建结果中可见的体素占比大于第二比例。
  16. 一种三维重建装置,其特征在于,包括:
    特征提取模块,用于从目标物体的原始二维图像中提取原始图像特征;
    第一重建模块,用于基于所述原始图像特征确定原始三维物体;
    补充视角模块,用于确定所述目标物体的补充视角的相机位姿,其中所述补充视角与生成所述原始二维图像的第一视角不同;
    补充图像模块,用于基于所述补充视角的相机位姿,生成所述目标物体在所述补充视角下的补充二维图像;
    第二重建模块,用于对所述补充二维图像进行三维重建,以生成与所述补充二维图像相对应的补充三维物体;以及
    融合模块,用于对所述原始三维物体和所述补充三维物体进行融合,以获得所述目标物体的三维重建结果。
  17. 一种三维重建系统,包括:处理器和存储器,其中,所述存储器中存储有计算机程序指令,其特征在于,所述计算机程序指令被所述处理器运行时用于执行如权利要求1至15任一项所述的三维重建方法。
  18. 一种存储介质,在所述存储介质上存储了程序指令,其特征在于,所述程序指令在运行时用于执行如权利要求1至15任一项所述的三维重建方法。
PCT/CN2019/120394 2019-11-22 2019-11-22 三维重建方法、装置、系统和存储介质 WO2021097843A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980002779.1A CN110998671B (zh) 2019-11-22 2019-11-22 三维重建方法、装置、系统和存储介质
PCT/CN2019/120394 WO2021097843A1 (zh) 2019-11-22 2019-11-22 三维重建方法、装置、系统和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/120394 WO2021097843A1 (zh) 2019-11-22 2019-11-22 三维重建方法、装置、系统和存储介质

Publications (1)

Publication Number Publication Date
WO2021097843A1 true WO2021097843A1 (zh) 2021-05-27

Family

ID=70080495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120394 WO2021097843A1 (zh) 2019-11-22 2019-11-22 三维重建方法、装置、系统和存储介质

Country Status (2)

Country Link
CN (1) CN110998671B (zh)
WO (1) WO2021097843A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022077190A1 (zh) * 2020-10-12 2022-04-21 深圳市大疆创新科技有限公司 数据处理方法、控制设备及存储介质
CN114697516B (zh) * 2020-12-25 2023-11-10 花瓣云科技有限公司 三维模型重建方法、设备和存储介质
CN113628348B (zh) * 2021-08-02 2024-03-15 聚好看科技股份有限公司 一种确定三维场景中视点路径的方法及设备
CN114119839B (zh) * 2022-01-24 2022-07-01 阿里巴巴(中国)有限公司 三维模型重建与图像生成方法、设备以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408762A (zh) * 2014-10-30 2015-03-11 福州大学 利用单目和二维平台获取物体图像信息及三维模型的方法
CN106210700A (zh) * 2016-07-14 2016-12-07 上海玮舟微电子科技有限公司 三维图像的获取系统、显示系统及所适用的智能终端
CN108269300A (zh) * 2017-10-31 2018-07-10 杭州先临三维科技股份有限公司 牙齿三维数据重建方法、装置和系统
CN110148084A (zh) * 2019-05-21 2019-08-20 智慧芽信息科技(苏州)有限公司 由2d图像重建3d模型的方法、装置、设备及存储介质
US20190333269A1 (en) * 2017-01-19 2019-10-31 Panasonic Intellectual Property Corporation Of America Three-dimensional reconstruction method, three-dimensional reconstruction apparatus, and generation method for generating three-dimensional model
WO2019211970A1 (ja) * 2018-05-02 2019-11-07 パナソニックIpマネジメント株式会社 三次元再構成方法及び三次元再構成装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109003325B (zh) * 2018-06-01 2023-08-04 杭州易现先进科技有限公司 一种三维重建的方法、介质、装置和计算设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408762A (zh) * 2014-10-30 2015-03-11 福州大学 利用单目和二维平台获取物体图像信息及三维模型的方法
CN106210700A (zh) * 2016-07-14 2016-12-07 上海玮舟微电子科技有限公司 三维图像的获取系统、显示系统及所适用的智能终端
US20190333269A1 (en) * 2017-01-19 2019-10-31 Panasonic Intellectual Property Corporation Of America Three-dimensional reconstruction method, three-dimensional reconstruction apparatus, and generation method for generating three-dimensional model
CN108269300A (zh) * 2017-10-31 2018-07-10 杭州先临三维科技股份有限公司 牙齿三维数据重建方法、装置和系统
WO2019211970A1 (ja) * 2018-05-02 2019-11-07 パナソニックIpマネジメント株式会社 三次元再構成方法及び三次元再構成装置
CN110148084A (zh) * 2019-05-21 2019-08-20 智慧芽信息科技(苏州)有限公司 由2d图像重建3d模型的方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110998671A (zh) 2020-04-10
CN110998671B (zh) 2024-04-02

Similar Documents

Publication Publication Date Title
WO2021097843A1 (zh) 三维重建方法、装置、系统和存储介质
US10360718B2 (en) Method and apparatus for constructing three dimensional model of object
Yang et al. Mobile3DRecon: Real-time monocular 3D reconstruction on a mobile phone
CN106600686B (zh) 一种基于多幅未标定图像的三维点云重建方法
CN102804231B (zh) 三维场景的分段平面重建
Waechter et al. Virtual rephotography: Novel view prediction error for 3D reconstruction
JP7448566B2 (ja) クロスリアリティシステムにおけるスケーラブル3次元オブジェクト認識
CN111133477B (zh) 三维重建方法、装置、系统和存储介质
Wei Converting 2d to 3d: A survey
CN104350525A (zh) 组合用于三维建模的窄基线和宽基线立体
Yin et al. Towards accurate reconstruction of 3d scene shape from a single monocular image
WO2023024441A1 (zh) 模型重建方法及相关装置、电子设备和存储介质
WO2018133119A1 (zh) 基于深度相机进行室内完整场景三维重建的方法及系统
CN111382618B (zh) 一种人脸图像的光照检测方法、装置、设备和存储介质
EP3309750B1 (en) Image processing apparatus and image processing method
CN114202632A (zh) 网格线性结构恢复方法、装置、电子设备及存储介质
Xu et al. Hybrid mesh-neural representation for 3d transparent object reconstruction
WO2020151078A1 (zh) 一种三维重建的方法和装置
US20080111814A1 (en) Geometric tagging
Lin et al. Multiview textured mesh recovery by differentiable rendering
Lhuillier Toward flexible 3d modeling using a catadioptric camera
JP2021026759A (ja) オブジェクトの3dイメージングを実施するためのシステムおよび方法
Han et al. Ro-map: Real-time multi-object mapping with neural radiance fields
Price et al. Augmenting crowd-sourced 3d reconstructions using semantic detections
Park et al. A tensor voting approach for multi-view 3D scene flow estimation and refinement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19953425

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 17915487

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 19953425

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19953425

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.01.2023)

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 19953425

Country of ref document: EP

Kind code of ref document: A1