CN117994482A

CN117994482A - Method, device, medium and equipment for reconstructing three-dimensional model based on image

Info

Publication number: CN117994482A
Application number: CN202410171666.7A
Authority: CN
Inventors: 彭以平; 周文; 焦少慧
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-05-07

Abstract

The embodiment of the disclosure provides a method, a device, a medium and equipment for reconstructing a three-dimensional model based on an image. One embodiment of the method comprises the following steps: generating a first three-dimensional model of a first model precision corresponding to a target object based on a plurality of images of the target object shot at multiple angles and camera poses corresponding to the images; generating a point cloud corresponding to the target object based on the first three-dimensional model; performing light sampling based on the images, the camera pose corresponding to each image and the sphere corresponding to the point in the point cloud, and determining a plurality of sampling points in each sphere; and carrying out nerve surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision. Thus, the three-dimensional model can be efficiently and accurately generated based on the plurality of images.

Description

Method, device, medium and equipment for reconstructing three-dimensional model based on image

Technical Field

The embodiment of the disclosure relates to the technical field of three-dimensional reconstruction, in particular to a method, a device, a medium and equipment for reconstructing a three-dimensional model based on an image.

Background

Three-dimensional reconstruction of objects has been an important area of research in computer graphics and computer vision in recent years. With the rapid development of internet technology, three-dimensional reconstruction technology is gradually mature. The goal of three-dimensional reconstruction is to recover the three-dimensional geometry of an object from single or multiple views. Compared with a two-dimensional image, the three-dimensional model can reflect the reality and the information integrity of the object, the description of the local detail and the geometric structure of the object is finer, and the problems of self-shielding, partial structure deletion and the like in the two-dimensional image can be solved by reconstructing the complete three-dimensional model. In three-dimensional vision applications, some objects may have complex structures and a large number of tiny components, and three-dimensional reconstruction of these objects is a challenging task. How to efficiently and accurately reconstruct an object in three dimensions is a problem to be solved.

Disclosure of Invention

Embodiments of the present disclosure describe a method and apparatus for three-dimensional model reconstruction based on images, which may reconstruct a target object in three dimensions through two phases, generating a first three-dimensional model in a first phase. And then, generating a point cloud of the target object based on the first three-dimensional model, and guiding the light sampling of the second stage according to the sphere generated by the points in the point cloud in space, so that the sampling is more efficient and accurate, and the second three-dimensional model generated in the second stage is more efficient and accurate.

According to a first aspect, there is provided a method of image-based three-dimensional model reconstruction, comprising: generating a first three-dimensional model of a first model precision corresponding to a target object based on a plurality of images of the target object shot at multiple angles and camera poses corresponding to the images; generating a point cloud corresponding to the target object based on the first three-dimensional model; performing light sampling based on the plurality of images, the camera pose corresponding to each image and the sphere corresponding to the point in the point cloud, and determining a plurality of sampling points in each sphere; and carrying out nerve surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision. Thus, the target object may be three-dimensionally reconstructed in two stages, a first three-dimensional model being generated in a first stage. And then, generating a point cloud of the target object based on the first three-dimensional model, and guiding the light sampling of the second stage according to the sphere generated by the points in the point cloud in space, so that the sampling is more efficient and accurate, and the second three-dimensional model generated in the second stage is more efficient and accurate.

In one embodiment, the method further comprises: and carrying out texture mapping on the second three-dimensional model according to the plurality of images and the camera pose corresponding to each image to obtain the three-dimensional model with textures. By the embodiment, the three-dimensional model can display more real and rich details.

In one embodiment, the generating a first three-dimensional model of the first model precision corresponding to the target object based on the multiple images of the target object photographed at multiple angles and the camera pose corresponding to each image includes: obtaining a first signed distance field based on the plurality of images, the camera pose corresponding to each image and a neural surface reconstruction method, wherein the first signed distance field is accelerated by adopting an acceleration training algorithm in the generation process; and generating a first three-dimensional model of a first model precision corresponding to the target object based on the first signed distance field. With the present embodiment, the first three-dimensional model may be generated based on the neural surface reconstruction method.

In one embodiment, the first three-dimensional model is a mesh model, and includes a plurality of vertices; and generating a point cloud corresponding to the target object based on the first three-dimensional model, including: and sampling the vertex of the first three-dimensional model by adopting a furthest point sampling algorithm to generate a point cloud corresponding to the target object. Therefore, the point number of the point cloud corresponding to the target object can be reduced through the furthest point sampling algorithm, and the accuracy of the generated point cloud can be ensured. In one embodiment, the points in the above-described point cloud form spheres by: and for each point in the point cloud, generating a sphere corresponding to the point by taking the point as a sphere center and taking the preset length as a radius. Thus, a sphere may be generated based on points in the point cloud.

In one embodiment, the light sampling based on the plurality of images, the camera pose corresponding to each image, and the sphere corresponding to the point in the point cloud, determining a plurality of sampling points in each sphere includes: projecting each sphere to an image, and selecting a pixel point from the projected image as a candidate pixel point; determining candidate rays based on the camera pose and the candidate pixel points; a plurality of sampling points are determined based on line segments within the sphere among the candidate rays. Thus, the determined sampling point can be more accurate.

In one embodiment, the candidate ray includes a first candidate ray that intersects at least one sphere, the at least one sphere including a first sphere; and determining a plurality of sampling points based on line segments located inside the sphere in the candidate light, including: determining a target length of a target line segment where the first candidate ray intersects the first sphere, and a total length of the line segment where the first candidate ray intersects the at least one sphere; and determining the number of sampling points in the target line segment based on the target length and the total length, and determining the number of sampling points from the target line segment. Thus, the obtained sampling point can be more accurate.

In one embodiment, the reconstructing the neural surface based on the position information of the plurality of images and the plurality of sampling points to obtain a second three-dimensional model with a second model accuracy includes: performing neural surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second signed distance field; and generating a second three-dimensional model corresponding to the target object based on the second signed distance field. With the present embodiment, the second three-dimensional model can be generated based on the neural surface reconstruction method.

According to a second aspect, there is provided an apparatus for image-based three-dimensional model reconstruction, comprising: a first generation unit configured to generate a first three-dimensional model of a first model accuracy corresponding to a target object based on a plurality of images of the target object captured at multiple angles and camera poses corresponding to the respective images; a second generation unit configured to generate a point cloud corresponding to the target object based on the first three-dimensional model; a sampling unit configured to sample light based on the plurality of images, camera poses corresponding to the respective images, and spheres corresponding to points in the point cloud, and determine a plurality of sampling points in each sphere; and the reconstruction unit is configured to reconstruct the nerve surface based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision.

According to a third aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the first aspects.

According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of the first aspects.

According to a fifth aspect, there is provided an electronic device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any one of the first aspects.

According to the method and the device for reconstructing the three-dimensional model based on the images, firstly, a first three-dimensional model with first model precision corresponding to a target object is generated based on a plurality of images of the target object shot at multiple angles and camera poses corresponding to the images. And then, generating a point cloud corresponding to the target object based on the first three-dimensional model. Then, light sampling is performed based on the plurality of images, the camera pose corresponding to each image, and the sphere corresponding to the point in the point cloud, and a plurality of sampling points are determined in each sphere. And finally, reconstructing the nerve surface based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision. Thus, the target object may be three-dimensionally reconstructed in two stages, a first three-dimensional model being generated in a first stage. And then, generating a point cloud of the target object based on the first three-dimensional model, and guiding the light sampling of the second stage according to the sphere generated by the points in the point cloud in space, so that the sampling is more efficient and accurate, and the second three-dimensional model generated in the second stage is more efficient and accurate.

Drawings

FIG. 1 illustrates a schematic diagram of one application scenario in which embodiments of the present disclosure may be applied;

FIG. 2 illustrates a flow diagram of a method for image-based three-dimensional model reconstruction, according to one embodiment;

FIG. 3 shows a schematic diagram of light sampling based on spheres formed by points in a point cloud;

FIG. 4 shows a schematic block diagram of an apparatus for image-based three-dimensional model reconstruction according to one embodiment;

fig. 5 shows a schematic diagram of an electronic device suitable for implementing an embodiment of the application.

Detailed Description

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

The technical scheme provided by the present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

As previously mentioned, in three-dimensional vision applications, some objects may have complex structures and a large number of tiny components, often requiring long and costly time to accurately generate a three-dimensional model of the object. Therefore, how to efficiently and accurately reconstruct an object in three dimensions is a problem to be solved. Therefore, the embodiment of the disclosure provides a method for reconstructing a three-dimensional model based on images, which can efficiently and accurately generate the three-dimensional model of a target object based on multiple images of the target object photographed at multiple angles.

Fig. 1 shows a schematic diagram of one application scenario in which embodiments of the present disclosure may be applied. As shown in fig. 1, a plurality of images taken from a plurality of angles around a target object may first be acquired, each image corresponding to a camera pose. In the first stage, a first three-dimensional model 101 corresponding to the target object may be generated based on the plurality of images and the camera pose corresponding to each image. It will be appreciated that the higher the accuracy of the generated three-dimensional model, the more computing resources that need to be expended and the longer the time that needs to be spent. In order to save computational resources and time, a three-dimensional model with lower model accuracy may be generated in the first stage. Thereafter, a point cloud 102 corresponding to the target object may be generated based on the first three-dimensional model 101, and a sphere may be generated based on points in the point cloud 102. In the second stage, light sampling may be performed based on the plurality of images, the camera pose corresponding to each image, and the sphere, and a plurality of sampling points may be determined in each sphere. Finally, based on the plurality of images and the position information of the plurality of sampling points, a neural surface reconstruction is performed, and a second three-dimensional model 103 is obtained. Because the second stage uses the sphere generated in space by the points in the point cloud of the target object to guide the light sampling, the sampling is more efficient and accurate, and the generated second three-dimensional model 103 is more accurate. Therefore, the model accuracy of the second three-dimensional model 103 generated in the second stage is higher than that of the first three-dimensional model 101.

With continued reference to fig. 2, fig. 2 shows a flow diagram of a method of image-based three-dimensional model reconstruction, according to one embodiment. The method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method for reconstructing a three-dimensional model based on an image may include the following steps 201 to 204, specifically:

Step 201, generating a first three-dimensional model of a first model precision corresponding to a target object based on a plurality of images of the target object photographed at multiple angles and camera poses corresponding to the images.

In the present embodiment, the target object may be various objects, for example, a table, a chair, an automobile, a house, a motorcycle, or the like. Among them, the structure of objects such as tables, chairs and the like is relatively simple, while objects such as motorcycles generally have a complex structure and a large number of small components. In order to generate a three-dimensional model corresponding to the target object, first, an image corresponding to the target object may be acquired, for example, a plurality of images of the target object may be taken from a plurality of angles around the target object, each image corresponding to a camera pose. Here, the camera pose may include a position of the camera in space and an orientation of the camera. Camera pose is a parameter used to project an image into a coordinate system in three-dimensional space, and for each input image, corresponding camera pose information needs to be provided. The information can be obtained by a camera calibration process, including parameters such as internal parameters and external parameters of the camera. And then, according to the collected multiple images of the target object at multiple angles and the camera pose corresponding to each image, generating a first three-dimensional model with first model precision corresponding to the target object. For example, a three-dimensional reconstruction algorithm based on an image, a surface reconstruction algorithm, a three-dimensional reconstruction algorithm based on deep learning, or the like may be employed to generate a first three-dimensional model corresponding to the target object. It will be appreciated that the higher the accuracy of the first three-dimensional model generated, the more computing resources that need to be consumed and the longer the time that needs to be spent. To save computational resources and time, a first three-dimensional model with lower model accuracy may be generated.

In some implementations, the step 201 may include the following steps S1 to S2, specifically:

step S1, a first signed distance field is obtained based on a plurality of images, camera pose corresponding to each image and a nerve surface reconstruction method.

In this implementation, based on a plurality of images of a target object photographed at multiple angles and camera poses corresponding to the respective images, a neural surface reconstruction (Neural Surface Reconstruction) is performed by using various neural surface reconstruction methods, so as to obtain a first signed distance field. For example, a Neus algorithm may be used for three-dimensional reconstruction, and the NeuS algorithm evolved from the NeRF (Neural RADIANCE FIELDS, neuroradiation field) algorithm. The neural network of Neus algorithm may represent the three-dimensional scene as an SDF field (SIGNED DISTANCE FIELD ). The first signed distance field may be generated by using an acceleration algorithm, for example, an Instant-NGP method may be used to accelerate the network training. Here, instant-NGP is a fast training method, collectively referred to as Instant Neural GRAPHICS PRIMITIVES (Instant neural graphics primitive), which proposes a technique called Multiresolut ion Hash Encoding (multi-resolution hash coding) that can be used to accelerate the training and reasoning process of neural networks. The technology can map input data to a high-dimensional space and use a hash table for quick indexing and retrieval, so that efficient feature extraction and data processing are realized. In this implementation, an Instant-NGP can be used to rapidly train Neus the neural network of the algorithm. Specifically, the coordinates (x, y, z) of the three-dimensional space point obtained by sampling the light can be input into the Instant-NGP, and a feature vector can be output by the Instant-NGP, and the feature vector can represent the attribute and the feature of the three-dimensional space point. The coordinates (x, y, z) of the three-dimensional spatial point and the feature vector together form the input to the SDF neural network of the Neus algorithm.

In this example, when the three-dimensional reconstruction is performed by adopting the Neus algorithm, random light sampling may be performed, for example, pixel points are randomly selected from an image, the selected pixel points and the center of the camera corresponding to the pose of the camera form light rays, so as to obtain light rays, and then the light rays are randomly sampled, for example, a plurality of points are randomly selected from the light rays, so as to obtain a plurality of sampling points, wherein the sampling points are three-dimensional points including values of an X axis, a Y axis and a Z axis. The neural network of the Neus algorithm may then be trained using the sampled data from the random light sampling to obtain the SDF field. The SDF field is a mathematical representation describing a three-dimensional shape in which the value of each point represents the distance of that point from the surface of the shape. Positive values indicate that the point is outside the shape, negative values indicate that the point is inside the shape, and zero values indicate that the point is just on the surface of the shape.

To aid understanding, the following briefly describes the training process of Neus algorithm, and the process of generating a three-dimensional model using the trained network, including the following steps 1-6, in particular:

Step 1, randomly selecting a plurality of pixels on an image of an object photographed at multiple angles, for example, randomly selecting batch_size=512 pixels, taking a camera optical center as an origin, taking the optical center to the pixels as a direction, emitting light, and obtaining m+n sampling points on the light. In this example, 512 rays are obtained, each having m+n sampling points.

Step 2, volume rendering, calculating color for each ray, may include the following steps 2.1) -2.5), specifically:

step 2.1), inputting M+N sampling points on a certain light line into an SDF neural network after position coding to obtain SDF values of the sampling points;

Step 2.2), converting the SDF into opacity in volume rendering through a preset formula, wherein each point is an opacity value;

Step 2.3), inputting M+N sampling points on a certain light line into a color neural network to obtain a color value of each point;

step 2.4), calculating the accumulated projection rate to obtain the value of each point;

step 2.5), calculating the color of the current light output through the opacity, the accumulated projection rate and the color value of each point on the current light.

And 3, calculating color loss, wherein ground truth (real data) is the color of the pixel corresponding to each ray, the predicted value is the color output by the volume rendering in the step 2, and the loss between the real data and the predicted value is calculated.

And 4, gradient back propagation, and updating weights of the SDF neural network and the color neural network.

And 5, repeating the steps 1-4 for a plurality of times to finish training.

And 6, uniformly dividing the side length of the 1x1x1 cube into 512 parts, namely obtaining 512x512x512 points, inputting the points into an SDF network to obtain SDF values of the points, and extracting a mesh (three-dimensional grid model) through a marking cube algorithm.

And S2, generating a first three-dimensional model with first model precision corresponding to the target object based on the first signed distance field.

In this implementation, a first three-dimensional model corresponding to the target object may be generated based on the first signed distance field generated in step S1. For example, a mesh may be extracted from the first signed distance field by Marching Cubes algorithm to obtain a first three-dimensional model.

Step 202, generating a point cloud corresponding to the target object based on the first three-dimensional model.

In this embodiment, the first three-dimensional model may be a mesh model, which may include a plurality of vertices. In particular, the mesh model may be a three-dimensional model consisting of a series of interconnected triangular or quadrangular patches. These triangular or quadrilateral patches may be defined by geometric elements such as vertices, edges, and faces. Thus, a point cloud corresponding to the target object can be generated based on the first three-dimensional model. For example, the vertices of the mesh model may be used to form a point cloud. If there are too many vertices in the mesh model, a point cloud may also be formed using some of the vertices in the mesh model (e.g., randomly selected some of the vertices).

In some implementations, the step 202 may include the following: and sampling the vertex of the first three-dimensional model by adopting a furthest point sampling algorithm, and generating a point cloud corresponding to the target object.

In this implementation, a point cloud may be formed using a portion of the vertices in the first three-dimensional model, and in order to make the generated point cloud more accurate, the furthest-away points may be sampled using an algorithm (Farthest Point Sampling). Here, the furthest point sampling is an algorithm for sampling a representative point from a point cloud. In this example, all vertices in the first three-dimensional model may be used as an initial point cloud, the furthest point sampling may be quickly sampled to representative points in the initial point cloud, and sampling to redundant points may be avoided. Further, since the most distant point is selected each time, the sampled points are distributed relatively uniformly in space. The point number of the point cloud corresponding to the target object can be reduced through the furthest point sampling algorithm, and the accuracy of the generated point cloud can be ensured.

In step 203, light sampling is performed based on the multiple images, the camera pose corresponding to each image, and the sphere corresponding to the point in the point cloud, and multiple sampling points are determined in each sphere.

In this embodiment, for each point in the point cloud corresponding to the target object, the sphere may be generated in space by various manners, for example, the sphere may be generated in space with each point as the center of the sphere. For another example, a sphere may be generated in space with the center point between two points as the center of the sphere. Light sampling is performed on the basis of the images, the camera pose corresponding to each image and the sphere, and a plurality of sampling points are determined in each sphere. For example, for a pixel point P in an image, a ray is formed from the camera center corresponding to the pose through the pixel point P, the ray may intersect a sphere, and a line segment intersecting the sphere is selected for sampling, so as to obtain a plurality of sampling points.

In some implementations, points in the point cloud corresponding to the target object may form a sphere by: for each point in the point cloud corresponding to the target object, a sphere corresponding to the point can be generated by taking the point as a sphere center and taking the preset length as a radius. Here, the radius may be determined in various ways, for example, manually set.

In some implementations, the step 203 may include the following steps 1) to 3), specifically:

And 1) projecting each sphere to an image, and selecting a pixel point from the projected image as a candidate pixel point.

In this implementation, each sphere may be projected according to the image and its corresponding camera pose. As shown in fig. 3, fig. 3 shows a schematic diagram of light sampling based on spheres formed by points in a point cloud. In the example shown in fig. 3, it is assumed that there is an image a photographed at a certain angle, the camera corresponding to the image a is a camera B, and a sphere formed by points in the point cloud is projected to the image a according to the camera pose of the camera B. If a sphere C formed by a certain point in the point cloud is projected to the image a and then is a circle R, one or more points can be selected from the circle R as candidate pixel points. In this example, the point Q is selected as a candidate pixel point.

And 2) determining candidate rays based on the pose of the camera and the candidate pixel points.

In this implementation, the candidate ray may be determined based on the camera pose and the candidate pixel point. Continuing with the example in FIG. 3, candidate rays may be formed from the center of the camera in camera B by candidate pixel points Q. The candidate ray passes through at least one sphere.

And 3) determining a plurality of sampling points based on line segments positioned inside the sphere in the candidate light rays.

In this implementation, a plurality of sampling points may be determined according to a line segment located inside the sphere among the candidate rays. For example, assume that the candidate ray passes through both spheres G1 and G2. The segment of the candidate ray located inside the sphere G1 is L1L2, and the segment of the candidate ray located inside the sphere G2 is L3L4. Thereafter, a sampling point may be determined from the line segment L1L2, and a sampling point may be determined from the line segment L3L4. For example, one or more sampling points may be randomly determined from the line segments L1L2 and L3L4.

Alternatively, the line segments in the candidate ray, which are located inside the sphere, can also be uniformly sampled. For example, assume that a candidate ray includes a first candidate ray that intersects at least one sphere that includes a first sphere. In this case, the above step 3) may specifically include the following steps 31) and 32), specifically:

Step 31), a target length of a target line segment of the first candidate ray intersecting the first sphere and a total length of the line segment of the first candidate ray intersecting the at least one sphere are determined.

Step 32), based on the target length and the total length, determining a number of sampling points within the target line segment, and determining a number of sampling points from the target line segment.

For example, the sampling may be performed using a Linspace function, and the Linspace function may generate an equally spaced sequence of numbers, i.e., linear equal-divided vectors, based on a user-specified start value, end value, and number of numbers in the sequence. In this example Linspace (a _i,b_i,n_i) may be used, where a _i may represent the start position where the ray intersects the ith sphere, b _i may represent the end position where the ray intersects the ith sphere, and n _i may represent the number of uniform sampling points in the ith sphere. The size of n _i may depend on the weight of the intersection distance b _i-a_i in the ith sphere among all the n sphere distance sums of intersection for the current ray, specifically:

Where N may represent the total number of sampling points on the light, and N may be a value set by human, as an example. Through the implementation mode, the obtained sampling points can be more accurate.

And 204, reconstructing the nerve surface based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision.

In this embodiment, the neural surface reconstruction (Neural Surface Reconstruction) may be performed in various manners based on the position information of the plurality of sampling points, to obtain a second three-dimensional model with the second model accuracy.

Alternatively, first, the neural surface reconstruction may be performed based on a plurality of images of the target object photographed at multiple angles and position information of a plurality of sampling points, to obtain the second signed distance field. For example, a Neus algorithm may be used for three-dimensional reconstruction, and the NeuS algorithm evolved from the NeRF (Neural RADIANCE FIELDS, neuroradiation field) algorithm. The neural network of Neus algorithm may represent the three-dimensional scene as an SDF field (SIGNED DISTANCE FIELD ). Then, a second three-dimensional model corresponding to the target object is generated based on the second signed distance field. For example, a second three-dimensional model may be obtained by extracting the mesh from the mesh by Marching Cubes (moving cube) algorithm. Because the sphere generated by the points in the point cloud of the target object in space is used for guiding the light to sample, the determined sampling points are more accurate, and the generated second three-dimensional model is more accurate. Therefore, the model accuracy of the second three-dimensional model is higher than that of the first three-dimensional model, i.e., the second model accuracy is higher than that of the first model accuracy.

It will be appreciated that by training the neural network, neus algorithm can infer color information of the three-dimensional model from the two-dimensional image. However, to increase the realism of the three-dimensional model, a second three-dimensional model may also be mapped to obtain a more realistic three-dimensional model. Thus, in some implementations, the method for reconstructing a three-dimensional model based on an image may further include texture mapping the second three-dimensional model, specifically: and carrying out texture mapping on the second three-dimensional model according to a plurality of images of the target object shot from multiple angles and camera poses corresponding to the images, so as to obtain the three-dimensional model with mapping.

In this implementation, texture mapping of the three-dimensional model may be performed in a variety of ways. Texture mapping (Texture mapping) may be employed, for example. Texture mapping is a technique for applying Texture (Texture) to an object surface. In computer graphics, texture mapping allows mapping of texture maps (TextureMap), such as images or photographs, to the surface of a three-dimensional model, thereby rendering the model more realistic and rich in detail.

Reviewing the above procedure, in the above-described embodiment of the present disclosure, first, a first three-dimensional model of a first model accuracy corresponding to a target object is generated based on a plurality of images of the target object photographed at multiple angles and camera poses corresponding to the respective images. And then, generating a point cloud corresponding to the target object based on the first three-dimensional model. Then, light sampling is performed based on the plurality of images, the camera pose corresponding to each image, and the sphere corresponding to the point in the point cloud, and a plurality of sampling points are determined in each sphere. And finally, reconstructing the nerve surface based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision. Thus, the target object may be three-dimensionally reconstructed in two stages, a first three-dimensional model being generated in a first stage. And then, generating a point cloud of the target object based on the first three-dimensional model, and guiding the light sampling of the second stage according to the sphere generated by the points in the point cloud in space, so that the sampling is more efficient and accurate, and the second three-dimensional model generated in the second stage is more efficient and accurate.

According to an embodiment of another aspect, an apparatus for three-dimensional model reconstruction based on an image is provided. The apparatus for reconstructing a three-dimensional model based on an image may be deployed in any apparatus, device, platform, device cluster, etc. having computing and processing capabilities.

FIG. 4 shows a schematic block diagram of an apparatus for image-based three-dimensional model reconstruction according to one embodiment. The apparatus shown in fig. 4 is used to perform the method shown in fig. 2. As shown in fig. 4, the apparatus 400 for reconstructing a three-dimensional model based on an image includes: a first generation unit 401 configured to generate a first three-dimensional model of a first model accuracy corresponding to a target object based on a plurality of images of the target object photographed at multiple angles and camera poses corresponding to the respective images; a second generation unit 402 configured to generate a point cloud corresponding to the target object based on the first three-dimensional model; a sampling unit 403 configured to sample light based on the plurality of images, the camera pose corresponding to each image, and the sphere corresponding to a point in the point cloud, and determine a plurality of sampling points in each sphere; and a reconstruction unit 404 configured to reconstruct a neural surface based on the plurality of images and the position information of the plurality of sampling points, thereby obtaining a second three-dimensional model with a second model accuracy.

In some optional implementations of this embodiment, the apparatus 400 further includes: and a mapping unit (not shown in the figure) configured to perform texture mapping on the second three-dimensional model according to the plurality of images and the camera pose corresponding to each image, so as to obtain a three-dimensional model with textures.

In some optional implementations of the present embodiment, the first generating unit 401 is further configured to: obtaining a first signed distance field based on the plurality of images, the camera pose corresponding to each image and a neural surface reconstruction method, wherein the first signed distance field is accelerated by adopting an acceleration training algorithm in the generation process; and generating a first three-dimensional model of a first model precision corresponding to the target object based on the first signed distance field.

In some optional implementations of this embodiment, the first three-dimensional model is a mesh model, including a plurality of vertices; and, the second generating unit 402 is further configured to: and sampling the vertex of the first three-dimensional model by adopting a furthest point sampling algorithm to generate a point cloud corresponding to the target object.

In some optional implementations of this embodiment, the points in the point cloud described above form spheres by: and for each point in the point cloud, generating a sphere corresponding to the point by taking the point as a sphere center and taking the preset length as a radius.

In some optional implementations of the present embodiment, the sampling unit 403 includes: a projection unit (not shown in the figure) configured to project each sphere onto an image, and select a pixel point from the projected image as a candidate pixel point; a first determining unit (not shown in the figure) configured to determine a candidate ray based on the camera pose and the candidate pixel point; a second determining unit (not shown in the figure) configured to determine a plurality of sampling points based on line segments located inside the sphere among the candidate rays.

In some optional implementations of this embodiment, the candidate ray includes a first candidate ray that intersects at least one sphere, the at least one sphere including a first sphere; and the second determining unit is further configured to: determining a target length of a target line segment where the first candidate ray intersects the first sphere, and a total length of the line segment where the first candidate ray intersects the at least one sphere; and determining the number of sampling points in the target line segment based on the target length and the total length, and determining the number of sampling points from the target line segment.

In some optional implementations of the present embodiment, the reconstruction unit 404 is further configured to: performing neural surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second signed distance field; and generating a second three-dimensional model corresponding to the target object based on the second signed distance field.

The foregoing apparatus embodiments correspond to the method embodiments, and specific descriptions may be referred to descriptions of method embodiment portions, which are not repeated herein. The device embodiments are obtained based on corresponding method embodiments, and have the same technical effects as the corresponding method embodiments, and specific description can be found in the corresponding method embodiments.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 2.

According to an embodiment of still another aspect, there is provided an electronic device including a memory and a processor, wherein executable code is stored in the memory, and the processor implements the method described in fig. 2 when executing the executable code.

The foregoing describes certain embodiments of the present disclosure, other embodiments being within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. Furthermore, the processes depicted in the accompanying figures are not necessarily required to achieve the desired result in the particular order shown, or in a sequential order. In some embodiments, multitasking and parallel processing are also possible, or may be advantageous.

Referring now to fig. 5, a schematic diagram of an electronic device 500 suitable for use in implementing embodiments of the present application is shown. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, etc.; output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 5 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the method of the embodiment of the present application are performed when the computer program is executed by the processing means 501.

The disclosed embodiments also provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method provided by the present disclosure.

It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (Radio Frequency), and the like, or any suitable combination thereof.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: generating a first three-dimensional model of a first model precision corresponding to a target object based on a plurality of images of the target object shot at multiple angles and camera poses corresponding to the images; generating a point cloud corresponding to the target object based on the first three-dimensional model; performing light sampling based on the images, the camera pose corresponding to each image and the sphere corresponding to the point in the point cloud, and determining a plurality of sampling points in each sphere; and carrying out nerve surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The various embodiments in this disclosure are described in a progressive manner, and identical and similar parts of the various embodiments are all referred to each other, and each embodiment is mainly described as different from other embodiments. In particular, for storage media and computing device embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing detailed description of the embodiments of the present invention further details the objects, technical solutions and advantageous effects of the embodiments of the present invention. It should be understood that the foregoing description is only specific to the embodiments of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method for three-dimensional model reconstruction based on images, comprising:

generating a first three-dimensional model of a first model precision corresponding to a target object based on a plurality of images of the target object shot at multiple angles and camera poses corresponding to the images;

generating a point cloud corresponding to the target object based on the first three-dimensional model;

Performing light sampling based on the images, the camera pose corresponding to each image and the sphere corresponding to the point in the point cloud, and determining a plurality of sampling points in each sphere;

And carrying out nerve surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision.

2. The method of claim 1, wherein the method further comprises:

And carrying out texture mapping on the second three-dimensional model according to the plurality of images and the camera pose corresponding to each image to obtain the three-dimensional model with textures.

3. The method of claim 1, wherein the generating a first three-dimensional model of the first model precision corresponding to the target object based on the plurality of images of the target object captured at multiple angles and the camera pose corresponding to each image comprises:

obtaining a first signed distance field based on the images and the camera pose and nerve surface reconstruction method corresponding to each image, wherein the first signed distance field is accelerated by adopting an acceleration training algorithm in the generation process;

And generating a first three-dimensional model of a first model precision corresponding to the target object based on the first signed distance field.

4. The method of claim 1, wherein the first three-dimensional model is a mesh model comprising a plurality of vertices; and generating a point cloud corresponding to the target object based on the first three-dimensional model, including:

and sampling the vertex of the first three-dimensional model by adopting a furthest point sampling algorithm, and generating a point cloud corresponding to the target object.

5. The method of claim 1, wherein points in the point cloud form spheres by:

And for each point in the point cloud, generating a sphere corresponding to the point by taking the point as a sphere center and taking the preset length as a radius.

6. The method of claim 1, wherein the determining a plurality of sampling points in each sphere based on the plurality of images, the camera pose for each image, and the sphere for the point in the point cloud comprises:

projecting each sphere to an image, and selecting a pixel point from the projected image as a candidate pixel point;

Determining candidate rays based on the camera pose and the candidate pixel points;

A plurality of sampling points are determined based on line segments within the sphere among the candidate rays.

7. The method of claim 6, wherein the candidate ray comprises a first candidate ray that intersects at least one sphere, the at least one sphere comprising a first sphere; and determining a plurality of sampling points based on line segments in the candidate light and positioned in the sphere, wherein the sampling points comprise:

Determining a target length of a target line segment where the first candidate ray intersects the first sphere, and a total length of the line segment where the first candidate ray intersects the at least one sphere;

and determining the number of sampling points in the target line segment based on the target length and the total length, and determining the number of sampling points from the target line segment.

8. The method of claim 1, wherein the reconstructing the neural surface based on the plurality of images and the position information of the plurality of sampling points, to obtain a second three-dimensional model of a second model accuracy, comprises:

performing neural surface reconstruction based on the plurality of images and the position information of the plurality of sampling points to obtain a second signed distance field;

And generating a second three-dimensional model corresponding to the target object based on the second signed distance field.

9. An apparatus for three-dimensional model reconstruction based on an image, comprising:

A first generation unit configured to generate a first three-dimensional model of a first model accuracy corresponding to a target object based on a plurality of images of the target object photographed at multiple angles and camera poses corresponding to the respective images;

a second generation unit configured to generate a point cloud corresponding to the target object based on the first three-dimensional model;

A sampling unit configured to sample light based on the plurality of images, camera poses corresponding to the respective images, and spheres corresponding to points in the point cloud, and determine a plurality of sampling points in each sphere;

And the reconstruction unit is configured to reconstruct the nerve surface based on the plurality of images and the position information of the plurality of sampling points to obtain a second three-dimensional model with second model precision.

10. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-8.

11. An electronic device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-8.