WO2023138477A1 - 三维模型重建与图像生成方法、设备以及存储介质 - Google Patents

三维模型重建与图像生成方法、设备以及存储介质 Download PDF

Info

Publication number
WO2023138477A1
WO2023138477A1 PCT/CN2023/071960 CN2023071960W WO2023138477A1 WO 2023138477 A1 WO2023138477 A1 WO 2023138477A1 CN 2023071960 W CN2023071960 W CN 2023071960W WO 2023138477 A1 WO2023138477 A1 WO 2023138477A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
sight
line
surface point
model
Prior art date
Application number
PCT/CN2023/071960
Other languages
English (en)
French (fr)
Inventor
章坚
付欢
黄锦池
罗鸿城
李玉洁
王家明
赵斌强
蔡博文
贾荣飞
汤兴
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023138477A1 publication Critical patent/WO2023138477A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts

Definitions

  • the present application relates to the technical field of image processing, and in particular to a three-dimensional model reconstruction and image generation method, device and storage medium.
  • the new view synthesis technology refers to a technology for generating a high-realistic image at any viewing angle by using existing images of the 3D scene for a 3D scene.
  • New view synthesis relies on the precise geometric structure of the 3D scene. However, due to the complexity of the 3D scene in the real world, it is difficult to obtain the precise geometric structure of the 3D scene. This makes the new view synthesis technology difficult from theory to implementation.
  • the industry proposed the Neural Radiance Field (NERF) algorithm, which uses a fully connected network to represent a 3D scene. Its input is a continuous 5-dimensional coordinate: spatial position (x, y, z) and viewing angle information ( ⁇ , ⁇ ), and its output is the volume density at the spatial position and color information related to the viewing angle; further combined with volume rendering (volume rendering) technology, the output color information and volume density can be projected onto a 2D image to achieve new view synthesis. Due to its simple structure and good rendering effect, the NERF algorithm has attracted a lot of attention. However, its viewing angle robustness is poor, and the image synthesis effect of some viewing angles is not good, so it is difficult to apply to the actual scene.
  • NERF Neural Radiance Field
  • aspects of the present application provide a 3D model reconstruction and image generation method, device, and storage medium for improving perspective robustness when performing model reasoning such as perspective image synthesis based on an implicit 3D representation model.
  • An embodiment of the present application provides a three-dimensional model reconstruction method, including: performing neural network-based three-dimensional reconstruction based on multiple original images containing a target object to obtain an initial implicit 3D representation model, the surface points on the target object correspond to pixels in the corresponding original image, and correspond to the first line of sight where the pixel points are photographed; construct an explicit three-dimensional model based on the initial implicit 3D representation model and the multiple original images, the explicit three-dimensional model includes color information of surface points on the target object, and the color information of each surface point is based on the first line of sight corresponding to the surface points Determined by the average viewing angle information; randomly generate the second line of sight corresponding to the surface point on the explicit three-dimensional model, and generate the average viewing angle information corresponding to the second line of sight corresponding to each surface point according to the color information of each surface point; according to the average viewing angle information corresponding to the second line of sight and the space point on the second line of sight Space coordinates, performing neural network-based three-dimensional reconstruction based on the initial implicit 3D representation model
  • the embodiment of the present application also provides an image generation method, including: according to the target camera pose to be rendered and the explicit 3D model corresponding to the target object, determine the target line of sight to be rendered and the average viewing angle information corresponding to the target line of sight; according to the spatial coordinates of the spatial point on the target line of sight and the average viewing angle information corresponding to the target line of sight, combined with the target implicit 3D representation model corresponding to the target object, generate a target image of the target object under the target camera pose; wherein the explicit 3D model and target implicit 3D representation model are integrated into the line of sight prior information and the average viewing angle information.
  • 3D reconstruction based on neural network.
  • the embodiment of the present application also provides a computer device, including: a memory and a processor; the memory is used to store a computer program; the processor is coupled to the memory, and is used to execute the computer program to perform the steps in the three-dimensional model reconstruction method or the image generation method provided by the embodiment of the present application.
  • the embodiment of the present application also provides a computer storage medium storing a computer program.
  • the processor can implement the steps in the three-dimensional model reconstruction method or the image generation method provided in the embodiment of the present application.
  • the 3D model reconstruction method provided in this embodiment is used to generate a neural network model capable of performing an implicit 3D representation of a target object, comprising the following operations: performing neural network-based 3D reconstruction and traditional 3D reconstruction on the basis of multiple original images containing the target object to obtain an initial implicit 3D representation model and an explicit 3D model; generating a random line of sight and an average viewing angle based on the explicit 3D model, and continuing to perform neural network-based 3D reconstruction based on the random sight line and average viewing angle on the basis of the initial implicit 3D representation model to obtain the target implicit 3D representation model.
  • the 3D reconstruction process by generating a random line of sight and substituting the average viewing angle information corresponding to the random line of sight for its real viewing angle information, the random line of sight and its corresponding average viewing angle information are used to enhance the line of sight data. Based on the enhanced line of sight data, the 3D reconstruction based on the neural network can be continued to obtain an implicit 3D representation model with strong robustness to the line of sight, which greatly improves the viewing angle robustness when synthesizing images with different viewing angles based on the implicit 3D representation model.
  • FIG. 1 is a schematic flow chart of a three-dimensional model reconstruction method provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an exemplary line of sight emitted from the optical center of the camera to the object space;
  • FIG. 3 is a schematic diagram of an exemplary line of sight passing through a surface point of a target object
  • FIG. 4 is a diagram of an exemplary three-dimensional model reconstruction method applicable to an application scene
  • FIG. 5 is a schematic diagram of an exemplary random line of sight generation
  • FIG. 6a is a schematic flowchart of a method for generating a three-dimensional model provided in an embodiment of the present application
  • FIG. 6b is a diagram of an application scene applicable to a 3D model generation method provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a three-dimensional model reconstruction device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image generation device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the line of sight prior information and the average viewing angle information are integrated to provide a new 3D model reconstruction method based on the neural network.
  • the 3D model reconstruction method provided in this embodiment can be applied in the model training stage to reconstruct a target implicit 3D representation model that implicitly expresses the target object in 3D (3D).
  • the target implicit 3D representation model can be used for model reasoning in a later stage.
  • a scenario for performing model reasoning based on the target implicit 3D representation model is to synthesize new perspective images based on the target implicit 3D representation model, but is not limited thereto.
  • the 3D model reconstruction method of this embodiment may also be a process of directly performing 3D reconstruction of the target object in an actual application scene, rather than being applied to the model training stage of pre-generating a target implicit 3D representation model that expresses the target object in implicit 3D (3D).
  • the three-dimensional reconstruction process provided by the embodiment of the present application is used to generate a neural network model capable of implicit three-dimensional (3D) representation of the target object, that is, an implicit 3D representation model of the target.
  • the process mainly includes the following operations: a video containing the target object or multiple original images is input, and the video contains multiple original images; 3D reconstruction based on neural network and traditional 3D reconstruction are respectively performed on the basis of multiple original images to obtain the initial implicit 3D representation model and explicit 3D model; random line of sight and average viewing angle are generated based on the explicit 3D model, and 3D reconstruction based on neural network is continued based on the random line of sight and average viewing angle combined with the initial implicit 3D representation model to obtain the target implicit 3D representation model.
  • both the initial implicit 3D representation model and the target implicit 3D representation model are neural network models that perform an implicit three-dimensional representation of the target object.
  • the random line of sight and its corresponding average viewing angle information are used to enhance the line of sight data required for 3D reconstruction.
  • the 3D reconstruction based on the neural network can be carried out, and an implicit 3D representation model with strong robustness to the line of sight can be obtained, which greatly improves the robustness of synthesizing images from different perspectives based on the implicit 3D representation model.
  • FIG. 1 is a schematic flow chart of a method for reconstructing a 3D model provided in an embodiment of the present application. As shown in Figure 1, the method may include the following steps:
  • the explicit 3D model includes the color information of the surface points on the target object, and the color information of each surface point is determined according to the average viewing angle information of the first line of sight corresponding to the surface point.
  • the average viewing angle information corresponding to the second line of sight and the spatial coordinates of the space points on the second line of sight perform 3D reconstruction based on the neural network based on the initial implicit 3D representation model, and obtain a target implicit 3D representation model that implicitly expresses the target object in 3D 3D.
  • the target object may be any object, such as shoes, tables, chairs, hats, wardrobes, apples and so on.
  • the 3D model of the target object is required.
  • a 3D reconstruction of the target object is required.
  • the content of the model seen under the new perspective can be determined based on the 3D model of the target object, and then the image under the new perspective can be rendered based on the model content.
  • a 3D reconstruction method based on a neural network is adopted, and the target object is expressed in 3D using the finally obtained target implicit 3D representation model.
  • the traditional 3D reconstruction process is further integrated. That is to say, in the embodiment of the present application, the 3D reconstruction based on the neural network is mainly used, and the traditional 3D reconstruction is integrated, which is referred to as 3D reconstruction of the target object for short.
  • multiple original images containing the target object are obtained, so as to perform neural network-based three-dimensional reconstruction based on the original images containing the target object.
  • the target object in the real world can be photographed from different shooting angles to obtain multiple original images containing the target object or obtain a video corresponding to the target object, and extract multiple original images containing the target object from the video.
  • a 360-degree circling around the target object may be used to obtain multiple original images of the target object.
  • different original images correspond to different camera poses, and the camera pose includes the position and posture of the shooting device when the image is captured.
  • this embodiment does not limit the photographing device, and the photographing device may be, for example but not limited to: a camera, a mobile phone with a photographing function, a tablet computer, a wearable device, and the like.
  • the line of sight emitted from the optical center of the camera of the real shooting device through the object space is called the first line of sight.
  • the first line of sight can be considered as the actual line of sight emitted by the real shooting device, and a first line of sight is emitted from the optical center of the camera of the shooting device and passes through the object space corresponding to each pixel point of the captured image.
  • the camera 1 that captures the chair image I1 and the camera 2 that captures the chair image I2 are real cameras, and the line of sight emitted from the optical center of the real camera (the solid line in Fig.
  • the camera 3 that shoots the chair image I3 is a hypothetical virtual camera (the camera in the dotted frame in Fig. 2), and the line of sight emitted from the optical center of the virtual camera (the dotted line with an arrow in Fig. 2) is a virtual line of sight, that is, the line of sight r3 is a virtual line of sight.
  • each pixel on an original image corresponds to a first line of sight.
  • the pixel in the sample image is obtained by imaging the first line of sight onto a surface point of the target object, and the first line of sight is also the line of sight that captures the pixel. It can be known that there is a corresponding relationship between the surface point on the target object and the pixel point and the first line of sight that captures the pixel point.
  • Different pixel points in each original image correspond to different surface points on the target object, and different surface points correspond to different first sight lines, that is to say, each pixel point in each original image corresponds to the first line of sight passing through the corresponding surface point on the target object, and different pixel points correspond to first sight lines passing through different surface points.
  • the pixels in different sample images may correspond to different surface points on the target object. For the two sample images, some of the pixels may correspond to the same surface point, or all the pixels may correspond to different surface points.
  • a neural network-based three-dimensional reconstruction is performed using multiple original images to obtain an initial implicit 3D representation model.
  • the initial implicit 3D representation model can implicitly express the target object in three dimensions, for example, it can express the object information of multiple dimensions such as the shape, texture, and material of the target object.
  • the initial implicit 3D characterization model is a fully connected neural network, which is also called a multi-layer perceptron ((Multi-Layer Perceptron, MLP).
  • MLP multi-layer perceptron
  • the initial implicit 3D characterization model is based on the spatial coordinates and viewing angle information of the input spatial point, respectively predicting the volume density and color information of the spatial point.
  • the initial implicit 3D characterization model can be expressed as:
  • x (x, y, z)
  • x is recorded as the space coordinate (x, y, z) of the space point
  • d ( ⁇ , ⁇ )
  • is the azimuth angle
  • is the elevation angle
  • c (r, g, b)
  • c is recorded as the color information (r, g, b) of the spatial point
  • r refers to red (Red, R)
  • g refers to green (Green, G)
  • b refers to blue (Blue, B).
  • is recorded as the volume density of the space point.
  • the initial implicit 3D representation model includes the F ⁇ network for predicting the ⁇ volume density and the Fc network for predicting the c color information. Therefore, the initial implicit 3D representation model can be further expressed as:
  • the input of the F ⁇ network is the spatial coordinate x of the spatial point
  • the output is the volume density and the intermediate feature f of the spatial point
  • the input of the Fc network is the intermediate feature f and the viewing angle information d of the spatial point
  • the input is the RGB value of the color information of the spatial point. That is to say, the volume density is only related to the spatial coordinate x, and the RGB value of the color information is related to the spatial coordinate and viewing angle information.
  • the camera pose corresponding to each original image is calculated respectively, and the multiple first sight lines emitted by the camera and the angle of view information of each first sight line are determined according to the camera pose corresponding to each original image and camera internal parameters and other data.
  • Sampling is performed on each first line of sight to obtain multiple spatial points. It should be understood that the angle of view information of the spatial point sampled from the same first line of sight is the angle of view information of the first line of sight.
  • the four dots on the line of sight r1 in Figure 3 are the four spatial points sampled on the line of sight r1, and the direction pointed by the arrow of the line of sight r1 is the viewing angle information of the line of sight r1, which is also the four spatial points sampled on the line of sight r1.
  • Angle information After obtaining multiple spatial points, use the spatial coordinates and viewing angle information of multiple spatial points to perform 3D reconstruction based on the neural network. This process can be performed multiple times in batches, and finally the initial implicit 3D representation model can be obtained. It should be noted that, the 3D reconstruction process performed multiple times in batches may be a model training process, but is not limited thereto. Specifically, the 3D reconstruction based on the neural network can be carried out in an iterative manner.
  • k original images can be randomly selected each time, and an image block with a size of m*n is randomly selected from the k original images, and the spatial coordinates and viewing angle information of the spatial points on the first line of sight corresponding to each pixel in the k image blocks are used to perform 3D reconstruction (or model training) based on the neural network.
  • the 3D reconstruction process is terminated when the loss function of the 3D reconstruction process meets the set requirements.
  • k is a natural number greater than or equal to 1, and k is less than or equal to the total number of original images; m and n are natural numbers greater than or equal to 1, m and n respectively represent the number of pixels of the image block in the horizontal and vertical dimensions, m is less than or equal to the width of the original image (the width dimension corresponds to the horizontal direction), n is less than or equal to the length of the original image (the length dimension corresponds to the vertical direction), and m and n can be the same or different.
  • a plurality of spatial points may be sampled on each first line of sight in an equidistant manner, that is, the sampling interval between any two adjacent spatial points is the same. It is also possible to use different sampling intervals to sample multiple spatial points on each first line of sight, and the size of the sampling interval is not limited.
  • the SLAM (simultaneous localization and mapping, real-time positioning and map construction) algorithm can be used to more accurately calculate the camera pose corresponding to each original image. Specifically, when calculating the camera pose, the SLAM algorithm first extracts the feature points of each original image, and then establishes the matching relationship between the feature points of the two adjacent original images, and calculates the relative camera pose between the adjacent two original images according to the matching relationship between the feature points of the two adjacent original images. The camera pose corresponding to each original image is calculated according to the relative camera pose between two original images.
  • an explicit three-dimensional model corresponding to the target object may be constructed according to the initial implicit 3D representation model and multiple original images.
  • the explicit three-dimensional model can be a Mesh (grid) model that can reflect the surface characteristics of the target object and can perform an explicit three-dimensional representation of the target object.
  • the explicit three-dimensional model includes surface points on the target object and the spatial coordinates and color information of each surface point. These surface points can form triangles and vertices in the explicit 3D model.
  • the explicit 3D model specifically includes multiple triangles and vertices.
  • the attribute information of the vertices includes the spatial coordinates, color information, material information and other texture information of the vertices.
  • Vertices are surface points, and each triangular surface also includes multiple surface points, where the spatial coordinates and color information of other surface points on the triangular surface except for the surface point as a vertex can be interpolated from the spatial coordinates and color information of the three vertices on the triangular surface to which they belong.
  • the color information of each surface point on the explicit 3D model is determined according to the average viewing angle information of the first line of sight corresponding to the surface point, and represents the average viewing angle information corresponding to any line of sight corresponding to the surface point.
  • the color information of each surface point on the explicit 3D model is not the real color information of the target object under light irradiation, but the color information that has a mapping relationship with the average viewing angle information of each first line of sight corresponding to the surface point.
  • constructing an explicit 3D model corresponding to the target object according to the initial implicit 3D representation model and multiple original images including: determining the corresponding spatial range of the target object according to the image features of the multiple original images; generating an initial 3D model corresponding to the target object based on the spatial range and the initial implicit 3D representation model,
  • the initial three-dimensional model includes surface points on the target object; for any surface point, the average value of viewing angle information of at least one first line of sight corresponding to the surface point is converted into color information of the surface point to obtain an explicit three-dimensional model.
  • an algorithm such as Structure from Motion (SfM) can be used to process the image features of multiple original images to estimate the sparse 3D point position corresponding to the target object, and the sparse 3D point position corresponding to the target object can help determine the spatial range of the target object in the world coordinate system.
  • the spatial range may be a spatial range having length, width and height, such as a cube space or a cuboid space, but is not limited thereto.
  • the above -mentioned spatial -based and initial implicit 3D characterization model generates the initial three -dimensional model of the target object to generate the initial three -dimensional model: based on the spatial range and the initial hidden 3D characterization model to generate the scalar field data corresponding to the target object.
  • Facial analysis multiple triangles, multiple vertices and spatial coordinates contained in the initial three -dimensional model, multiple vertices on multiple triangle surfaces, and their space coordinates, multiple triangle and multiple vertices are used to limit each surface point contained in the initial three -dimensional model.
  • the above-mentioned space range is a cuboid space with length, width and height
  • an implementation manner of generating the scalar field data corresponding to the target object based on the space range and the initial implicit 3D representation model is as follows: the cuboid space is sampled at equal intervals in the three dimensions of length, width and height to obtain multiple target space points, wherein eight adjacent target space points form a volume element; input the spatial coordinates of multiple target space points into the initial implicit 3D representation model to obtain the volume density of multiple target space points; volume elements and volume densities of target space points contained in the volume elements Form scalar field data.
  • the spatial points are sampled at equal intervals in the three dimensions of length, width and height to obtain multiple target space points; multiple target space points can form multiple small cubes, one of which is a volume element; for each small cube, the spatial coordinates of the space points on the small cube are input into the initial implicit 3D representation model, and the volume density of these target space points is obtained.
  • the Marching cube (moving cube) algorithm is used to analyze the triangular faces of the volume elements to obtain the triangular faces contained in the initial 3D model, the vertices on the triangular faces and their space coordinates.
  • the triangular faces include multiple surface points, and the vertices are also surface points.
  • Each surface point contained in the initial three-dimensional model can be determined according to the triangular faces and vertices.
  • the Marching Cube algorithm will process the voxels (that is, volume elements) in the three-dimensional scalar field one by one, separate the voxels intersecting the isosurface, and calculate the intersection point between the isosurface and the side of the cube by interpolation; according to the relative position of each vertex of the cube and the isosurface, connect the intersection points of the isovalue surface and the side of the cube in a certain way to generate a triangular surface, which is an approximate representation of the isosurface in the cube; then, after obtaining all the triangular surfaces, these triangular surfaces can be connected to form The initial 3D model corresponding to the target object.
  • the above equal interval sampling refers to equal interval sampling in the same dimension, that is, the sampling interval used for spatial point sampling in any dimension of length, width, and height is the same, but the sampling intervals in different dimensions can be different, of course, can also be the same.
  • the sampling interval in the long dimension is 1
  • the sampling interval in the wide dimension is 0.5
  • the sampling interval in the high dimension is 0.8 to ensure sampling in three dimensions out the same number of target space points.
  • the sampling intervals in the three dimensions of length, width and height may all be 1, so as to ensure that the same number of target space points are sampled in the three dimensions.
  • the color information of the surface point is determined according to the viewing angle information of at least one first line of sight corresponding to the surface point.
  • the initial 3D model for which the color information of each surface point has been determined is called an explicit 3D model.
  • the color information of the surface point can be determined in the following way:
  • any surface point determine at least one first line of sight corresponding to the surface point from the first lines of sight corresponding to different camera poses. It should be noted that there is only one first line of sight corresponding to the same surface point at the same camera pose. However, in the process of shooting multiple original images with different camera poses, the same surface point is usually captured by two or more camera poses. is photographed below, that is, there is only one first line of sight corresponding to the surface point. Further, the average value of the viewing angle information of at least one first line of sight corresponding to the surface point is calculated, and the average value is converted into color information of the surface point for storage.
  • a pre-stored angle of view map corresponding to each original image may also be generated, and the angle of view pre-stored map stores the angle of view information of the first line of sight corresponding to each pixel in the original image. It is worth noting that based on the camera pose and camera internal parameters of the original image, it is not difficult to determine the line equation information of the first line of sight that exits from the optical center position when the original image is taken and passes through the surface point corresponding to the pixel point of the original image. Based on the line equation information of the first line of sight, the viewing angle information of the first line of sight can be quickly obtained according to geometric principles.
  • FIG. 4 shows that two images are only exemplary illustrations.
  • the i-th image among the multiple images is denoted as I i
  • the viewing angle pre-stored image corresponding to image I i is denoted as R(I i )
  • R(I i ) records the viewing angle information of the first line of sight corresponding to each pixel in image I i
  • the j-th image among the plurality of images is denoted as I j
  • the pre-stored image corresponding to the image I j is denoted as R(I j )
  • R(I j ) records the angle of view information of the first line of sight corresponding to each pixel in the image I j , where i, j are positive integers,
  • any surface point converting the average value of the viewing angle information of at least one first line of sight corresponding to the surface point into the color information of the surface point to obtain an explicit three-dimensional model, including: for any surface point, according to the camera poses corresponding to multiple original images, combined with the initial three-dimensional model, determining from multiple original images At least one target original image containing the target pixel corresponding to the surface point; converting the average value of the viewing angle information of the first line of sight corresponding to the target pixel stored in the viewing angle prestored map corresponding to at least one target original image into the color information of the surface point.
  • multiple original images correspond to different camera poses, and different camera poses correspond to different viewing angles.
  • the image data of any surface point falling within the viewing angle range can be collected, and then the target pixel corresponding to the surface point is included in the collected original image.
  • the target pixel the pixel corresponding to the surface point
  • the original image containing the target pixel corresponding to the surface point in multiple original images is called the target original image;
  • the viewing angle range corresponding to the camera pose can be determined based on the camera pose and camera internal parameters of the original image.
  • the spatial coordinates of any surface point are obtained from the initial 3D model.
  • the original image captured by the camera pose is the target original image corresponding to any surface point. If the spatial coordinates of any surface point do not fall within the viewing angle range corresponding to the camera pose, the original image captured under the camera pose is not the target original image corresponding to any surface point.
  • any surface point after determining at least one target original image containing the target pixel corresponding to the surface point, according to the image position of the target pixel in each target original image, query the angle of view information of the first line of sight recorded on the corresponding image position of the angle of view pre-stored map corresponding to each target original image, obtain the angle of view information of the first line of sight corresponding to the target pixel, and average the angle of view information of the first line of sight corresponding to these target pixels to obtain the average angle of view information corresponding to the surface point, and use the mapping relationship between the angle of view information and color information to convert the average angle of view information corresponding to the surface point is the color information of the surface point.
  • any surface point V determine multiple target original images including the surface point V, and multiply the image coordinates of the surface point V in the target original image and the view angle information of the first line of sight corresponding to the target pixel in the target original image to obtain multiple products, and obtain the average view angle information corresponding to the surface point V based on the multiple products Further, referring to the following formula (4), multiple products can be averaged to obtain the average viewing angle information corresponding to the surface point V
  • the average viewing angle information corresponding to the surface point V can be calculated according to formula (4)
  • V UV (I i ) can be calculated according to formula (5):
  • V UV (I i ) is the image coordinates of the surface point V in the image I i .
  • V brings in the space coordinates (x, y, z) of the surface point V in the world coordinate system
  • K is the known internal camera reference
  • Z is the depth information of V.
  • T W2C (I i ) represents the transformation matrix between the camera coordinate system and the world coordinate system corresponding to the image I i . It should be understood that the camera poses of different images are different, so the camera coordinate systems corresponding to different images are also different.
  • L refers to the number of raw images captured to the surface point V. For example, among the 20 original images obtained by shooting the target object, among which 5 original images include the surface point V, the value of L is 5.
  • the randomly generated virtual line of sight is called the second line of sight.
  • the second line of sight is the virtual line of sight emitted by the hypothetical virtual camera.
  • the second line of sight corresponding to the surface point may be randomly generated, and the average viewing angle information corresponding to the second line of sight corresponding to the surface point may be generated according to the color information of the surface point.
  • the first line of sight corresponding to the surface point may be used as a reference line of sight, and the second line of sight corresponding to the surface point may be randomly generated within a certain range of the reference line of sight. It is worth noting that if the surface point appears in multiple original images under different camera poses, the corresponding second line of sight can be randomly generated for the surface point under each camera pose. In simple terms, for any surface point, the second line of sight corresponding to the surface point can be randomly generated according to the first line of sight corresponding to the surface point.
  • randomly generating the second line of sight corresponding to the surface point according to the first line of sight corresponding to the surface point includes: according to the spatial coordinates of the surface point and the viewing angle information of the first line of sight corresponding to the surface point, randomly generating a line of sight that passes through the surface point and is different from the first line of sight corresponding to the surface point as the second line of sight.
  • a candidate space range is determined; in the candidate space range, a line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel point is randomly generated as a second line of sight.
  • the candidate spatial range may be a spatial range of any shape.
  • the candidate space range is the space range of the pyramid with the space coordinates of the surface point as the circle point and the first line of sight corresponding to the target pixel point as the center line.
  • the included angle range between the second line of sight and the first line of sight passing through the surface point may be [- ⁇ , ⁇ ] degrees.
  • n is, for example, 30 degrees.
  • the cone in Figure 5 takes OV as the center line, and the surface point 5 of the chair as the cone point.
  • O is the optical center position of the real camera that emits the first line of sight
  • O' is the optical center position of the virtual camera that emits the second line of sight
  • OV is the first line of sight
  • O'V is the second line of sight generated randomly
  • the angle range between all O'V lines of sight (rays with arrows in light color in Figure 4) and OV in the cone is [-30, 30] degrees.
  • a pre-stored depth map corresponding to each original image may be pre-generated, so as to quickly obtain the spatial coordinates of the surface points based on the pre-stored depth map, thereby improving the efficiency of randomly generating the second line of sight.
  • the depth pre-stored map corresponding to each original image stores the depth information of the surface points corresponding to each pixel in the original image.
  • an optional implementation method of randomly generating a second line of sight corresponding to the surface point according to the first line of sight corresponding to the surface point is: for any surface point, according to the camera pose corresponding to the multiple original images, combined with the explicit 3D model, determine at least one target original image containing the target pixel corresponding to the surface point from the multiple original images; for each target original image, according to the depth corresponding to the target original image The depth information of the surface point corresponding to the target pixel point stored in the pre-stored image is calculated, the spatial coordinates of the surface point are calculated, and according to the spatial coordinates of the surface point and the viewing angle information of the first line of sight corresponding to the target pixel point, a line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel point is randomly generated as a second line of sight.
  • the operation of selecting at least one target original image of any surface point from multiple original images can be performed again, or it can no longer be performed, but the corresponding relationship between the surface point and the target original image is recorded when the above operation is performed, and at least one target original image corresponding to any surface point is directly obtained based on the corresponding relationship.
  • the spatial coordinates of the surface point can be obtained based on the straight line equation passing through the surface point.
  • the first line of sight is line of sight r1
  • line of sight r1 hits the surface point V on the chair
  • the distance (depth information) between the surface point V and the optical center position O is recorded as t z
  • the space coordinates of the surface point V can be calculated when the spatial coordinates of the optical center position O and the viewing angle information of the line of sight r1 are known.
  • Method 1 For any pixel point in each original image, for any spatial point on the first line of sight corresponding to the pixel point, according to the sampling distance between the spatial points, the volume density of the space point, the depth information, and the volume density of other space points before the space point, calculate the depth information of the camera optical center corresponding to the first line of sight corresponding to the pixel point from the space point; perform weighted average of the depth information from multiple spatial points on the first line of sight corresponding to the pixel point to the optical center of the camera, and obtain the depth information from the surface point corresponding to the pixel point to the optical center of the camera; according to each original image Each pixel corresponds to the depth information from the surface point to the optical center of the camera, and generates a depth pre-stored map corresponding to each original image.
  • the method 1 can be used to generate the depth pre-stored map after the initial implicit 3D representation model is obtained, or the depth pre-stored map can be generated in the method 1 before or after the explicit 3D model is constructed, which is not limited in this embodiment.
  • O is the position of the optical center corresponding to the first line of sight
  • d is the viewing angle information of the first line of sight
  • t is the depth information of a certain spatial point on the first line of sight
  • t reflects the distance between a certain spatial point on the first line of sight and the optical center position O.
  • N is a positive integer greater than 1
  • i is a positive integer between 1 and N
  • record the depth information corresponding to the i-th target space point as t i record the cumulative volume density of the i-1 previous target space points as T i
  • record the depth information from the surface point passing through the first line of sight to the optical center of the camera as t z , t z can be calculated according to formula (6):
  • the camera pose corresponding to the original image For each original image, use the camera pose corresponding to the original image to rasterize the explicit 3D model to obtain the depth information from the surface point corresponding to each pixel in the original image to the optical center of the camera; according to the depth information from the surface point to the optical center of the camera corresponding to each pixel in the original image, generate a depth pre-stored map corresponding to the original image. It is worth noting that the method 2 is used to generate the depth pre-stored map after the explicit 3D model is obtained.
  • the second line of sight can be randomly generated for the surface points corresponding to each pixel point in multiple original images, so that multiple second lines of sight can be randomly generated, and the average view angle information corresponding to the multiple second lines of sight can be obtained.
  • the average view angle information corresponding to the multiple second lines of sight and the spatial coordinates of the spatial points on the multiple second lines of sight can be used to continue to perform neural network-based three-dimensional reconstruction (or model training) based on the initial implicit 3D representation model to obtain the target implicit 3D representation model.
  • the line of sight r3 in FIG. 3 may be regarded as a randomly generated second line of sight, and the dots on the line of sight r3 are multiple spatial points.
  • the above method can be used to generate all the second sight lines and their corresponding average viewing angle information in advance, and then multiple rounds of iterations can be used, each time using the average viewing angle information corresponding to part of the second sight line and the spatial coordinates of some spatial points on the second sight line, and continuing to perform 3D reconstruction (or model training) on the basis of the initial implicit 3D representation model, until the target implicit 3D representation model whose loss function meets the requirements of the 3D reconstruction is obtained.
  • the above-mentioned method can be used in real time to generate the second line of sight and its corresponding average view angle information required for the current round of iterations, and based on the real-time generated average view angle information corresponding to the second line of sight and the spatial coordinates of the spatial points on the second line of sight generated in real time, continue to perform 3D reconstruction (or model training) on the basis of the initial implicit 3D representation model until the target implicit 3D representation model whose loss function of the 3D reconstruction meets the requirements is obtained.
  • the viewing angle information of multiple second sight lines is the same, which is the average viewing angle information calculated based on the viewing angle information of the first sight line corresponding to the surface point
  • the color information of the spatial point can be expressed as:
  • F ⁇ (x) indicates that the F ⁇ network used to predict the ⁇ volume density outputs the intermediate features corresponding to the spatial point based on the spatial coordinates of the spatial point on the second line of sight. That is, the color information of any spatial point on the second line of sight is based on the average viewing angle information and F ⁇ (x) obtained.
  • the average view angle information corresponding to each second line of sight and the spatial coordinates of the spatial points on the second line of sight are sequentially used to continue 3D reconstruction on the basis of the initial implicit 3D representation model.
  • the RGB color information of each spatial point on each second line of sight is integrated using the volume density of each spatial point on each second line of sight predicted in the previous batch using stereoscopic rendering technology.
  • the predicted RGB color information of the pixels corresponding to each second line of sight in the previous batch calculates the loss function based on the predicted RGB color information of the pixels corresponding to each second line of sight in the previous batch and the actual RGB color information of the pixels corresponding to each second line of sight (the actual RGB color information here refers to the color information of the pixel in the corresponding sample image). If the loss function converges, the three-dimensional reconstruction (or model training) process is completed so far. The spatial coordinates of the spatial points continue to be trained iteratively until the loss function converges.
  • N is a positive integer greater than 1.
  • i is a positive integer between 1 and N.
  • T i can be calculated according to formula (7):
  • j is a positive integer between 1 and i-1.
  • the 3D model reconstruction method performs neural network-based 3D reconstruction and traditional 3D reconstruction on the basis of multiple original images containing the target object, respectively, to obtain an initial implicit 3D representation model and an explicit 3D model; based on the explicit 3D model, a random line of sight and an average viewing angle are generated, and based on the random line of sight and average viewing angle, the neural network-based 3D reconstruction is continued on the basis of the initial implicit 3D representation model to obtain the target implicit 3D representation model.
  • the initial implicit 3D representation model and the target implicit 3D representation model are both implicit three-dimensional representations of the target object.
  • the random line of sight and its corresponding average viewing angle information are used to enhance the line of sight data.
  • the 3D reconstruction based on the neural network can be carried out, and an implicit 3D representation model with strong robustness to the line of sight can be obtained, which greatly improves the robustness of synthesizing images from different perspectives based on the implicit 3D representation model.
  • the target implicit 3D representation model and explicit 3D model based on the target object can meet the needs of users to render the image of any viewing angle of the target object.
  • combining the implicit 3D representation model of the target (not shown in FIG. 4 ) and the average viewing angle information represented by the color information of each surface point on the target object carried by the explicit 3D model can render a viewing angle image with better quality.
  • the embodiment of the present application further provides an image generation method.
  • FIG. 6 is a schematic flowchart of an image generation method provided by an embodiment of the present application. As shown in Figure 6, the method may include the following steps:
  • the target camera pose to be rendered and the explicit 3D model corresponding to the target object determine the target line of sight to be rendered and average viewing angle information corresponding to the target line of sight.
  • the spatial coordinates of the spatial points on the target line of sight and the average viewing angle information corresponding to the target line of sight combined with the target implicit 3D representation model corresponding to the target object, generate a target image of the target object in the target camera pose.
  • the explicit 3D model and the implicit 3D representation model of the target are obtained in the process of integrating the line of sight prior information and the average viewing angle information for 3D reconstruction based on the neural network.
  • the neural network-based 3D reconstruction process incorporating the line-of-sight prior information and the average viewing angle information can be realized by using the 3D reconstruction method provided in the above-mentioned embodiments, which will not be repeated here.
  • the target camera pose to be rendered can be obtained, and then based on the target camera pose and the explicit 3D model corresponding to the target object, the target line of sight to be rendered and the average viewing angle information corresponding to the target line of sight are determined; after obtaining the target line of sight and the average viewing angle information corresponding to the target line of sight, combined with the target implicit 3D representation model corresponding to the target object, a target image of the target object under the target camera pose is generated.
  • the process of determining the target line of sight to be rendered and the average viewing angle information corresponding to the target line of sight based on the target camera pose and the explicit 3D model corresponding to the target object includes: based on the rasterized rendering result of the explicit 3D model corresponding to the target object based on the target camera pose to be rendered, determine the target surface point and its color information on the explicit 3D model within the field of view corresponding to the target camera pose; , to obtain the spatial point on the target line of sight; and convert the color information of the target surface point into the average viewing angle information represented by the color information, as the average viewing angle information corresponding to the target line of sight, and thus obtain the average viewing angle information corresponding to the target line of sight and the spatial coordinates of the spatial point on the target line of sight.
  • the process of generating the target image of the target object under the target camera pose according to the spatial coordinates of the spatial point on the target line of sight and the average viewing angle information corresponding to the target line of sight, combined with the target implicit 3D representation model corresponding to the target object includes: combining the average viewing angle information corresponding to the target line of sight and the spatial point on the target line of sight
  • the spatial coordinates of the target are input into the implicit 3D representation model of the target to obtain the color information and volume density of each spatial point on the target line of sight; using stereo rendering technology, through the volume density of each space point on each target line of sight, the color information of each space point on each target line of sight is integrated, and the color information of the target surface point corresponding to each target line of sight under the target camera pose is obtained.
  • the target image of the target object under the target camera pose can be rendered according to the color information of the target surface point under the target camera pose.
  • the target image refers to a 2D image containing the target object. It should be noted that there are multiple target surface points, each corresponding to a pixel in the target image.
  • a neural network-based 3D reconstruction service can be provided to users.
  • the service can be deployed on the server, and the server can be on the cloud.
  • the implementation form can be a cloud server, a virtual machine, a container, etc.; of course, the server can also be implemented by a traditional server, and there is no limitation on this.
  • the service provides a human-computer interaction interface for users, and the human-computer interaction interface may be a web interface or a command window. Users can use the service through the human-computer interaction interface provided by the service, for example, submit the original image or the target camera pose corresponding to the perspective image to be rendered to the server through the human-computer interaction interface, and display the explicit 3D model corresponding to the target object or the rendered perspective image through the human-computer interaction interface.
  • the user displays the human-computer interaction interface corresponding to the 3D reconstruction service based on the neural network on the terminal device used by the user, and the user uploads or captures images through the human-computer interaction interface to submit multiple original images containing the target object required for 3D reconstruction.
  • multiple original images containing the target object are obtained; after that, the three-dimensional reconstruction process is performed, that is, the three-dimensional reconstruction based on the neural network is performed according to the multiple original images containing the target object, and the initial implicit 3D representation model is obtained; the explicit three-dimensional model is constructed according to the initial implicit 3D representation model and multiple original images; the second line of sight corresponding to the surface points on the explicit three-dimensional model is randomly generated, and the average viewing angle information corresponding to the second line of sight corresponding to each surface point is generated according to the color information of each surface point; The average viewing angle information corresponding to the line of sight and the spatial coordinates of the spatial points on the second line of sight are used for 3D reconstruction based on the neural network based on the initial implicit 3D representation model to obtain the target implicit 3D representation model.
  • the three-dimensional reconstruction process that is, the three-dimensional reconstruction based on the neural network is performed according to the multiple original images containing the target object, and the initial implicit 3D representation model is obtained
  • the explicit three-dimensional model is constructed according
  • the message that the implicit 3D representation model of the target has been obtained can also be output on the human-computer interaction interface, so as to inform the user that a new perspective image can be synthesized based on the implicit 3D representation model of the target;
  • the user inputs the pose of the target camera to be rendered on the human-computer interaction interface; responds to the input operation on the human-computer interaction interface, and obtains the pose of the target camera to be rendered; after that, the image synthesis process is performed, that is, the target to be rendered is determined according to the pose of the target camera to be rendered and the explicit 3D model corresponding to the target object
  • the average angle of view information corresponding to the line of sight and the target line of sight according to the spatial coordinates of the spatial point on the target line of sight and the average angle of view information corresponding to the target line of sight, combined with the target implicit 3D representation model corresponding to the target object, generate a target image of the target object under the target camera pose, and output the target image.
  • the image generation method provided by the embodiment of the present application combines the implicit 3D representation model and the explicit 3D model of the target
  • the carried average viewing angle information of each surface point on the target object can render a target image with better quality, which satisfies the user's demand for rendering an arbitrary viewing angle image of the target object.
  • the quality of the main image of the product directly affects the customer flow of the e-commerce store.
  • the selected main image of the product cannot provide a good perspective to display the product information, making it difficult to effectively attract customers to click on the product link and affect the customer flow of the e-commerce store.
  • a large number of images need to be taken to ensure the selection of a good quality product main image, the labor cost is high, and the production efficiency of the product main image is low.
  • the three-dimensional model reconstruction method provided in the embodiment of the present application can be used to create the main product image.
  • merchants can use terminal devices such as mobile phones, mobile phones, tablet computers, wearable smart devices, smart home devices, etc. to shoot a video in a 360-degree surround manner around the product object, and the merchant can initiate an image upload operation on the human-computer interaction interface (such as a web interface) provided by the terminal device to upload the video including multiple product images to the server that performs the 3D model reconstruction method, as shown in 1 in Fig. 6b.
  • the server is a single server or a distributed server cluster composed of multiple servers.
  • the server may be a cloud server.
  • the server performs 3D model reconstruction based on multiple commodity images to obtain a target implicit 3D representation model for 3D representation of the commodity object and a display 3D model of the commodity object.
  • the merchant can input the rendering perspective on the human-computer interaction interface provided by the terminal device.
  • the terminal device analyzes the rendering perspective to obtain the corresponding camera pose to be rendered, and generates a new perspective image acquisition request including the camera pose to be rendered, and sends the new perspective image acquisition request to the server, as shown in 3 in Figure 6b.
  • the device sends the new perspective image of the commodity object for the terminal device to display the new perspective image.
  • Merchants can view new perspective images of commodity objects on their terminal devices.
  • AI Artificial Intelligence, artificial intelligence
  • the viewing of products has been upgraded from the traditional viewing of pictures and videos to viewing of matching and effects in the AI home scene.
  • the designer can use a mobile phone to shoot a 360-degree video around furniture, electrical appliances and other objects in the real scene, and upload the video to the 3D model reconstruction device that implements the 3D model reconstruction method.
  • the 3D model reconstruction device performs 3D model reconstruction based on multiple images in the video, obtains 3D models of furniture and electrical appliances, and matches the 3D models of furniture and electrical appliances to the 3D floor plan to complete the task of creating AI home scenes.
  • the subject of execution of each step of the method provided in the above embodiments may be the same device, or, The method is also executed by different devices.
  • the execution subject of steps 101 to 104 may be device A; for another example, the execution subject of steps 101 and 102 may be device A, and the execution subject of steps 103 and 104 may be device B; and so on.
  • FIG. 7 is a schematic structural diagram of a three-dimensional model reconstruction device provided by an embodiment of the present application. As shown in Figure 7, the device may include: a reconstruction module 71, a construction module 72 and a generation module 73;
  • the reconstruction module 71 is configured to perform neural network-based three-dimensional reconstruction based on multiple original images containing the target object to obtain an initial implicit 3D representation model for implicit three-dimensional 3D representation of the target object.
  • the multiple original images correspond to different camera poses, and different pixels in each original image correspond to the first line of sight passing through different surface points on the target object, or in other words, the surface points on the target object correspond to the pixels in the corresponding original image, and correspond to the first line of sight where the pixel points are photographed.
  • the construction module 72 is configured to construct an explicit three-dimensional model corresponding to the target object according to the initial implicit 3D representation model and the plurality of original images, where the explicit three-dimensional model includes color information of surface points on the target object, and the color information of each surface point is determined by the average viewing angle information of at least one first line of sight corresponding to the surface point.
  • a generating module 73 configured to randomly generate a second line of sight corresponding to a surface point on the explicit three-dimensional model, and generate average viewing angle information corresponding to the second line of sight corresponding to each surface point according to the color information of each surface point;
  • the reconstruction module 71 is further configured to perform neural network-based three-dimensional reconstruction based on the initial implicit 3D representation model according to the average viewing angle information corresponding to the second line of sight and the spatial coordinates of the spatial points on the second line of sight, so as to obtain a target implicit 3D representation model that performs implicit three-dimensional 3D representation of the target object.
  • the construction module 72 is specifically configured to: determine the spatial range corresponding to the target object according to the image features of the multiple original images; generate an initial 3D model corresponding to the target object based on the spatial range and the initial implicit 3D representation model, the initial 3D model includes surface points on the target object; convert the average value of the viewing angle information of the first line of sight corresponding to each surface point on the initial 3D model into color information of each surface point, to obtain The explicit 3D model.
  • the construction module 72 when the construction module 72 generates the initial 3D model corresponding to the target object based on the spatial range and the initial implicit 3D representation model, it is specifically used to: generate scalar field data corresponding to the target object based on the spatial range and the initial implicit 3D representation model, the scalar field data includes a plurality of volume elements; perform triangular surface analysis on the multiple volume elements to obtain multiple triangular faces contained in the initial 3D model, multiple vertices on the multiple triangular faces and their spatial coordinates, and the multiple triangular faces and multiple vertices are used to define the initial Each surface point contained in the 3D model.
  • the space range is a cuboid space with a length, width and height.
  • the construction module 72 When the construction module 72 generates the scalar field data corresponding to the target object based on the space range and the initial implicit 3D representation model, it is specifically used to: perform equidistant sampling on the cuboid space in the three dimensions of length, width and height to obtain multiple target space points, wherein eight adjacent target space points form a volume element; input the spatial coordinates of the multiple target space points into the initial implicit 3D representation model to obtain the volume density of the multiple target space points; the volume element and the volume The volume density of the target space points contained by the elements forms the scalar field data.
  • the generating module 73 is also configured to: generate a pre-stored view angle map corresponding to each original image, where the view angle information of the first line of sight corresponding to each pixel in the original image is stored in the pre-stored view angle map;
  • the construction module 72 converts the average value of the angle of view information of the first line of sight corresponding to each surface point on the initial three-dimensional model into the color information of each surface point to obtain the explicit three-dimensional model, specifically for: for any surface point, according to the camera poses corresponding to the multiple original images, combined with the initial three-dimensional model, determine from the multiple original images at least one target original image containing the target pixel corresponding to the surface point; store the angle of view information of the first line of sight corresponding to the target pixel in the angle of view pre-stored map corresponding to the at least one target original image The average value of is converted to the color information of the surface points.
  • the generation module 73 is also configured to: generate a depth pre-stored map corresponding to each original image, and the depth pre-stored map stores the depth information of each pixel in the original image corresponding to the surface point;
  • the generation module 73 randomly generates the second line of sight corresponding to the surface point on the explicit three-dimensional model, it is specifically used to: for any surface point, according to the camera pose corresponding to the multiple original images, combined with the explicit three-dimensional model, determine from the multiple original images at least one target original image that contains the target pixel point corresponding to the surface point;
  • the angle of view information of the first line of sight corresponding to the target pixel is randomly generated as a second line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel.
  • the generation module 73 randomly generates a line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel point as the second line of sight according to the spatial coordinates of the surface point and the viewing angle information of the first line of sight corresponding to the target pixel point, it is specifically used to: determine a candidate space range according to the spatial coordinates of the surface point and the viewing angle information of the first line of sight corresponding to the target pixel point; in the candidate space range, randomly generate a line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel point as the second line of sight.
  • the candidate space range is a space range of a pyramid with the space coordinates of the surface point as a circle point and the first line of sight corresponding to the target pixel point as a centerline.
  • the generation module 73 when the generation module 73 generates the depth prestored image corresponding to each original image, it is specifically used to: for any pixel in each original image, for any spatial point on the first line of sight corresponding to the pixel, according to the sampling distance between spatial points, the volume density of the spatial point, depth information, and For the volume density of other spatial points, calculate the depth information of the camera optical center corresponding to the first line of sight corresponding to the pixel point from the spatial point; perform weighted average of the depth information from the plurality of spatial points on the first line of sight corresponding to the pixel point to the optical center of the camera to obtain the depth information from the surface point corresponding to the pixel point to the optical center of the camera; generate a depth pre-stored map corresponding to each original image according to the depth information from the surface point corresponding to each pixel point in each original image to the optical center of the camera; or, for each original image, use the camera pose corresponding to the original image to the explicit
  • the 3D model is rasterized and rendered to
  • the above device further includes: a determination module and a rendering module;
  • a determining module configured to determine the target line of sight to be rendered and the average viewing angle information corresponding to the target line of sight according to the target camera pose to be rendered and the explicit 3D model;
  • a rendering module configured to generate a target image of the target object under the target camera pose according to the spatial coordinates of the spatial points on the target line of sight and the average viewing angle information corresponding to the target line of sight, combined with the target implicit 3D representation model.
  • the determination module is specifically configured to: perform raster rendering on the explicit 3D model according to the target camera pose, to obtain target surface points and their color information within the field of view corresponding to the target camera pose; for any target surface point, obtain a target line of sight from the optical center of the camera corresponding to the target camera pose to the target surface point, and generate average viewing angle information corresponding to the target line of sight according to the color information of the target surface point.
  • the rendering module is specifically configured to: input the average viewing angle information corresponding to the target line of sight and the spatial coordinates of the spatial points on the target line of sight into the implicit 3D representation model of the target to obtain the color information and volume density of the space points on the target line of sight; perform volume rendering according to the color information and volume density of the spatial points on the target line of sight to obtain the target image of the target object under the target camera pose.
  • FIG. 8 is a schematic structural diagram of an image generating device provided by an embodiment of the present application. As shown in FIG. 8, the device may include: a determination module 82 and a rendering module 83;
  • a determining module 82 configured to determine the target line of sight to be rendered and the average viewing angle information corresponding to the target line of sight according to the target camera pose to be rendered and the explicit three-dimensional model corresponding to the target object;
  • the rendering module 83 is configured to generate a target image of the target object under the target camera pose according to the spatial coordinates of the spatial point on the target line of sight and the average viewing angle information corresponding to the target line of sight, in combination with the target implicit 3D representation model corresponding to the target object; wherein, the explicit 3D model and the target implicit 3D representation model are obtained by integrating the prior information of the line of sight and the average viewing angle information to perform 3D reconstruction based on a neural network.
  • the determining module is specifically configured to: perform rasterized rendering on the explicit 3D model according to the pose of the target camera to obtain the target surface points and their Color information: for any target surface point, obtain the target line of sight from the camera optical center corresponding to the target camera pose to the target surface point, and generate the average viewing angle information corresponding to the target line of sight according to the color information of the target surface point.
  • the rendering module is specifically configured to: input the average viewing angle information corresponding to the target line of sight and the spatial coordinates of the spatial points on the target line of sight into the implicit 3D representation model of the target to obtain the color information and volume density of the space points on the target line of sight; perform volume rendering according to the color information and volume density of the spatial points on the target line of sight to obtain the target image of the target object under the target camera pose.
  • the above device further includes: a reconstruction module, a construction module and a generation module;
  • the reconstruction module is configured to perform neural network-based three-dimensional reconstruction according to multiple original images containing the target object, to obtain an initial implicit 3D representation model for implicit three-dimensional 3D representation of the target object, and the surface points on the target object correspond to pixels in the corresponding original image, and correspond to the first line of sight where the pixel points are photographed.
  • the plurality of original images correspond to different camera poses, and different pixel points in each original image correspond to first sight lines passing through different surface points on the target object;
  • a construction module configured to construct an explicit three-dimensional model corresponding to the target object according to the initial implicit 3D representation model and the plurality of original images, the explicit three-dimensional model including color information of surface points on the target object, and the color information of each surface point is determined according to the average viewing angle information of the first line of sight corresponding to the surface point;
  • a generating module configured to randomly generate a second line of sight corresponding to a surface point on the explicit three-dimensional model, and generate average viewing angle information corresponding to the second line of sight corresponding to each surface point according to the color information of each surface point;
  • the reconstruction module is further configured to perform neural network-based three-dimensional reconstruction based on the initial implicit 3D representation model according to the average viewing angle information corresponding to the second line of sight and the spatial coordinates of the spatial points on the second line of sight, so as to obtain a target implicit 3D representation model for implicit three-dimensional 3D representation of the target object.
  • FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device includes: a memory 91 and a processor 92 .
  • the memory 91 is used to store computer programs, and can be configured to store other various data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on the computing platform, contact data, phonebook data, messages, pictures, videos, etc.
  • the memory 91 can be implemented by any type of volatile or non-volatile storage devices or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory magnetic memory
  • flash memory magnetic disk or optical disk.
  • the processor 92 coupled with the memory 91, is used to execute the computer program in the memory 91, so as to: perform neural network-based three-dimensional reconstruction according to multiple original images containing the target object, and obtain an initial implicit 3D representation model for implicit three-dimensional 3D expression of the target object, and the surface points on the target object correspond to the original image corresponding to the pixel points in the image, and corresponding to the first line of sight where the pixel points are photographed; the plurality of original images correspond to different camera poses, and the different pixel points in each original image correspond to the first line of sight passing through different surface points on the target object; according to the initial implicit 3D representation model and the multiple original images, an explicit three-dimensional model corresponding to the target object is constructed, and the explicit three-dimensional model includes color information of surface points on the target object, and the color information of each surface point is determined according to the average viewing angle information of the first line of sight corresponding to the surface point; randomly generated The second line of sight corresponding to the surface point on the explicit three-dimensional model, and according to the color
  • the processor 92 constructs the explicit 3D model corresponding to the target object according to the initial implicit 3D representation model and the multiple original images, it is specifically configured to: determine the spatial range corresponding to the target object according to the image features of the multiple original images; generate an initial 3D model corresponding to the target object based on the spatial range and the initial implicit 3D representation model, the initial 3D model includes surface points on the target object; convert the average value of the viewing angle information of the first line of sight corresponding to each surface point on the initial 3D model into color information of each surface point. , to obtain the explicit 3D model.
  • the processor 92 when the processor 92 generates the initial three-dimensional model corresponding to the target object based on the spatial range and the initial implicit 3D representation model, it is specifically configured to: generate scalar field data corresponding to the target object based on the spatial range and the initial implicit 3D representation model, where the scalar field data includes multiple volume elements; perform triangular face analysis on the multiple volume elements to obtain multiple triangular faces included in the initial 3D model, multiple vertices on the multiple triangular faces and their spatial coordinates, and the multiple triangular faces and multiple vertices are used to define each surface point included in the initial three-dimensional model.
  • the space range is a cuboid space with length, width and height.
  • the processor 92 When the processor 92 generates the scalar field data corresponding to the target object based on the space range and the initial implicit 3D representation model, it is specifically used to: perform equidistant sampling on the cuboid space in the three dimensions of length, width and height to obtain multiple target space points, wherein eight adjacent target space points form a volume element; input the spatial coordinates of the multiple target space points into the initial implicit 3D representation model to obtain the volume density of the multiple target space points; the volume element and the volume element The volume density of the contained object space points forms the scalar field data.
  • the processor 92 is further configured to: generate a pre-stored view angle map corresponding to each original image, where the view angle information of the first line of sight corresponding to each pixel in the original image is stored in the pre-stored view angle map;
  • the processor 92 converts the average value of the angle of view information of the first line of sight corresponding to each surface point on the initial three-dimensional model into the color information of each surface point, so as to obtain the explicit three-dimensional model, specifically for: for any surface point, according to the camera pose corresponding to the multiple original images, combined with the initial three-dimensional model, determine at least one target original image containing the target pixel point corresponding to the surface point from the multiple original images; The average value of the viewing angle information of the corresponding first line of sight is converted into the color information of the surface point.
  • the processor 92 is also configured to: generate a depth pre-stored map corresponding to each original image, and the depth pre-stored map stores the depth information of each pixel in the original image corresponding to the surface point;
  • the processor 92 randomly generates the second line of sight corresponding to the surface point on the explicit three-dimensional model, it is specifically used to: for any surface point, according to the camera pose corresponding to the multiple original images, combined with the explicit three-dimensional model, determine from the multiple original images at least one target original image that includes the target pixel point corresponding to the surface point;
  • the angle of view information of the first line of sight corresponding to the target pixel is randomly generated as a second line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel.
  • the processor 92 randomly generates a line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel point as the second line of sight according to the spatial coordinates of the surface point and the viewing angle information of the first line of sight corresponding to the target pixel point, it is specifically configured to: determine a candidate space range according to the spatial coordinates of the surface point and the viewing angle information of the first line of sight corresponding to the target pixel point; in the candidate space range, randomly generate a line of sight that passes through the surface point and is different from the first line of sight corresponding to the target pixel point as the second line of sight.
  • the candidate spatial range is a pyramid spatial range whose centerline is the first line of sight corresponding to the target pixel point with the spatial coordinates of the surface point as the circle point.
  • the processor 92 when the processor 92 generates the depth prestored image corresponding to each original image, it is specifically used to: for any pixel point in each original image, for any spatial point on the first line of sight corresponding to the pixel point, according to the sampling distance between the spatial points, the volume density of the spatial point, the depth information, and the volume density of other spatial points before the spatial point, calculate the depth information from the spatial point to the camera optical center corresponding to the first line of sight corresponding to the pixel point; perform weighted average of the depth information from multiple spatial points on the first line of sight corresponding to the pixel point to the optical center of the camera Obtain the depth information from the surface point corresponding to the pixel point to the optical center of the camera; generate a depth prestored map corresponding to each original image according to the depth information from the surface point corresponding to each pixel point to the optical center of the camera in each original image; or, for each original image, use the camera pose corresponding to the original image to perform raster rendering on the explicit 3D model to obtain the depth
  • the processor 92 is further configured to: according to the target camera pose to be rendered and the explicit 3D model, determine the target line of sight to be rendered and the average viewing angle information corresponding to the target line of sight; according to the spatial coordinates of the spatial points on the target line of sight and the average viewing angle information corresponding to the target line of sight, combined with the target implicit 3D representation model, generate a target image of the target object under the target camera pose.
  • the processor 92 is specifically configured to: Performing rasterized rendering on the explicit 3D model by the target camera pose to obtain target surface points and their color information within the field of view corresponding to the target camera pose; for any target surface point, obtain the target line of sight from the optical center of the camera corresponding to the target camera pose to the target surface point, and generate the average viewing angle information corresponding to the target line of sight according to the color information of the target surface point.
  • the processor 92 when the processor 92 generates the target image of the target object under the target camera pose, it is specifically configured to: input the average viewing angle information corresponding to the target line of sight and the spatial coordinates of the spatial points on the target line of sight into the target implicit 3D representation model to obtain the color information and volume density of the space points on the target line of sight; perform volume rendering according to the color information and volume density of the spatial points on the target line of sight to obtain the target image of the target object under the target camera pose.
  • the computer device further includes: a communication component 93 , a display 94 , a power supply component 95 , an audio component 96 and other components.
  • FIG. 9 only schematically shows some components, which does not mean that the computer equipment only includes the components shown in FIG. 9 .
  • the components in the dotted line box in Figure 9 are optional components, not mandatory components, depending on the product form of the production scheduling equipment.
  • the computer device in this embodiment can be implemented as a terminal device such as a desktop computer, a notebook computer, a smart phone, or an IOT device, or as a server device such as a conventional server, a cloud server, or a server array. If the computer equipment of this embodiment is realized as a terminal device such as a desktop computer, a notebook computer, or a smart phone, the components in the dotted line box in FIG.
  • the embodiments of the present application also provide a computer-readable storage medium storing a computer program.
  • the computer program When the computer program is executed, the steps that can be executed by the computer device in the above method embodiments can be realized.
  • the embodiments of the present application also provide a computer program product, including computer programs/instructions, which, when executed by a processor, cause the processor to implement the steps executable by the computer device in the above method embodiments.
  • the above-mentioned communication component is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access a wireless network based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof.
  • the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication assembly also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • the above-mentioned display includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or a swipe action, but also detect duration and pressure associated with the touch or swipe operation.
  • a power supply component provides power for various components of the equipment where the power supply component is located.
  • a power component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power component resides components.
  • the aforementioned audio components may be configured to output and/or input audio signals.
  • the audio component includes a microphone (MIC), which is configured to receive an external audio signal when the device on which the audio component is located is in an operation mode, such as a calling mode, a recording mode, and a speech recognition mode.
  • the received audio signal may be further stored in a memory or sent via a communication component.
  • the audio component further includes a speaker for outputting audio signals.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory capable of directing a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means that implement the functions specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to generate computer-implemented processing, so that the instructions executed on the computer or other programmable equipment provide steps for realizing the functions specified in one flow or multiple flows of the flow chart and/or one or more square blocks of the block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of storage media for computers include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technologies, CD-ROM (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic tape cartridge, magnetic tape memory or other magnetic storage device, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Image Generation (AREA)

Abstract

本申请实施例提供一种三维模型重建与图像生成方法、设备以及存储介质。在本申请实施例中,以包含目标物体的多张原始图像为基础分别进行基于神经网络的三维重建和传统的三维重建,得到初始隐式3D表征模型和显式三维模型;基于显式三维模型进行随机视线和平均视角的生成,通过产生随机视线并以随机视线对应的平均视角信息代替其真实视角信息的方式,利用随机视线及其对应的平均视角信息增强视线数据,基于增强后的视线数据继续进行基于神经网络的三维重建,可以得到对视线具有较强鲁棒性的隐式3D表征模型,大大提升基于该隐式3D表征模型合成不同视角图像时的鲁棒性。

Description

三维模型重建与图像生成方法、设备以及存储介质
本申请要求于2022年01月24日提交中国专利局、申请号为202210081291.6、申请名称为“三维模型重建与图像生成方法、设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种三维模型重建与图像生成方法、设备以及存储介质。
背景技术
新视角合成技术是指针对一个三维场景,使用该三维场景的已有图像生成任意视角下的高真实感图像的技术。新视角合成依赖三维场景精确的几何结构,但是,由于现实世界中的三维场景比较复杂,很难获得三维场景精确的几何结构,这导致新视角合成技术从理论到落地实施较为困难。
于是,业界提出了神经辐射场(Neural Radiance Field,NERF)算法,该算法利用全连接网络来表示三维场景,其输入是一个连续的5维坐标:空间位置(x,y,z)和视角信息(θ,φ),其输出是该空间位置处的体积密度和视角相关的颜色信息;进一步结合立体渲染(volume rendering)技术,可以将输出的颜色信息和体积密度投影到2D图像上,从而实现新视图合成。由于简单结构和良好的渲染效果,NERF算法吸引了大量关注,但是,它的视角鲁棒性较差,部分视角的图像合成效果不好,难以应用于实际场景中。
发明内容
本申请的多个方面提供一种三维模型重建与图像生成方法、设备以及存储介质,用以提升基于隐式三维表征模型进行模型推理如视角图像合成时的视角鲁棒性。
本申请实施例提供一种三维模型重建方法,包括:根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到初始隐式3D表征模型,所述目标物体上的表面点与对应原始图像中的像素点对应,且与拍摄到所述像素点的第一视线对应;根据所述初始隐式3D表征模型和所述多张原始图像,构建显式三维模型,所述显式三维模型包括所述目标物体上表面点的颜色信息,每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的;随机生成所述显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;根据所述第二视线对应的平均视角信息和所述第二视线上空间点的 空间坐标,基于所述初始隐式3D表征模型进行基于神经网络的三维重建,得到目标隐式3D表征模型。
本申请实施例还提供一种图像生成方法,包括:根据待渲染的目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息;根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标物体对应的目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像;其中,所述显式三维模型和目标隐式3D表征模型是融入视线先验信息和平均视角信息进行基于神经网络的三维重建得到的。
本申请实施例还提供一种计算机设备,包括:存储器和处理器;存储器,用于存储计算机程序;处理器耦合至存储器,用于执行计算机程序以用于执行本申请实施例提供的三维模型重建方法或图像生成方法中的步骤。
本申请实施例还提供一种存储有计算机程序的计算机存储介质,当计算机程序被处理器执行时,致使处理器能够实现本申请实施例提供的三维模型重建方法或图像生成方法中的步骤。
本实施例提供的三维模型重建方法,用于产生能够对目标物体进行隐式三维表示的神经网络模型,包括以下操作:以包含目标物体的多张原始图像为基础分别进行基于神经网络的三维重建和传统的三维重建,得到初始隐式3D表征模型和显式三维模型;基于显式三维模型进行随机视线和平均视角的生成,基于随机视线和平均视角在初始隐式3D表征模型基础上继续进行基于神经网络的三维重建,得到目标隐式3D表征模型。在该三维重建过程中,通过产生随机视线并以随机视线对应的平均视角信息代替其真实视角信息的方式,利用随机视线及其对应的平均视角信息增强视线数据,基于增强后的视线数据继续进行基于神经网络的三维重建,可以得到对视线具有较强鲁棒性的隐式3D表征模型,大大提升基于该隐式3D表征模型合成不同视角图像时的视角鲁棒性。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的一种三维模型重建方法的流程示意图;
图2为示例性的视线从相机光心发射到物体空间的示意图;
图3为示例性的视线穿过目标物体表面点的示意图;
图4为示例性的三维模型重建方法所适用的应用场景图;
图5为示例性的随机视线生成示意图;
图6a为本申请实施例提供的一种三维模型生成方法的流程示意图;
图6b为本申请实施例提供的一种三维模型生成方法所适用的应用场景图;
图7为本申请实施例提供的一种三维模型重建装置的结构示意图;
图8为本申请实施例提供的一种图像生成装置的结构示意图;
图9为本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
针对现有基于NERF算法的三维重建方案的鲁棒性较差,难以实际应用的问题,在本申请一些实施例中,在基于神经网络的三维重建过程中,融入视线先验信息和平均视角信息,提供一种新的基于神经网络的三维模型重建方法,该方法具有较高的鲁棒性,可大大降低三维建模成本,解决了基于神经网络进行三维重建的实际应用问题,具有较高的应用价值。本实施例提供的三维模型重建方法可以应用在模型训练阶段中以重建出对目标物体进行隐式三维(3D)表达的目标隐式3D表征模型,该目标隐式3D表征模型可在后期进行模型推理,一种基于目标隐式3D表征模型进行模型推理的场景为基于目标隐式3D表征模型进行新视角图像的合成,但不限于此。当然,本实施例的三维模型重建方法也可以是在实际应用场景中直接对目标物体进行三维重建的过程,而并非应用于预先生成对目标物体进行隐式三维(3D)表达的目标隐式3D表征模型的模型训练阶段。无论是哪种应用方式,本申请实施例提供的三维重建过程,用于产生能够对目标物体进行隐式三维(3D)表示的神经网络模型,即目标隐式3D表征模型。该过程主要包括以下操作:以包含目标物体的视频或者多张原始图像为输入,该视频中包含多张原始图像;以多张原始图像为基础分别进行基于神经网络的三维重建和传统的三维重建,得到初始隐式3D表征模型和显式三维模型;基于显式三维模型进行随机视线和平均视角的生成,基于随机视线和平均视角结合初始隐式3D表征模型继续进行基于神经网络的三维重建,得到目标隐式3D表征模型。其中,初始隐式3D表征模型和目标隐式3D表征模型都是对目标物体进行隐式三维表示的神经网络模型。在三维重建过程中,通过产生随机视线并以随机视线对应的平均视角信息代替其真实视角信息的方式,利用随机视线及其对应的平均视角信息增强三维重建所需的视线数据,基于增强后的视线数据继续进行基于神经网络的三维重建,可以得到对视线具有较强鲁棒性的隐式3D表征模型,大大提升基于该隐式3D表征模型合成不同视角图像时的鲁棒性。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图1为本申请实施例提供的一种三维模型重建方法的流程示意图。如图1所示,该方法可以包括以下步骤:
101、根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到对目标物体进行隐式三维3D表达的初始隐式3D表征模型,目标物体上的表面点与对应原 始图像中的像素点对应,且与拍摄到该像素点的第一视线对应。
102、根据初始隐式3D表征模型和多张原始图像,构建目标物体对应的显式三维模型,显式三维模型包括目标物体上表面点的颜色信息,每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的。
103、随机生成显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息。
104、根据第二视线对应的平均视角信息和第二视线上空间点的空间坐标,基于初始隐式3D表征模型进行基于神经网络的三维重建,得到对目标物体进行隐式三维3D表达的目标隐式3D表征模型。
在本实施例中,目标物体可以是任意物体,例如为鞋子、桌子、椅子、帽子、衣柜、苹果等等。在全景显示、自动化建模、3D商品展示、新视角图像合成等多种应用场景中,都需要目标物体的三维模型。为此,需要对目标物体进行三维重建。以新视角图像合成为例,可以基于目标物体的三维模型确定新视角下看到的模型内容,进而基于该模型内容渲染出新视角下的图像。在本实施例中,为了更好地保留物体的纹理信息,提高三维重建的模型质量,采用基于神经网络的三维重建方式,并利用最终得到的目标隐式3D表征模型对目标物体进行三维表达。在此过程中,进一步融合了传统的三维重建过程。也就是说,在本申请实施例中,以基于神经网络的三维重建为主,并融合了传统的三维重建,简称为对目标物体进行三维重建。
在对目标物体进行三维重建之前,获取包含目标物体的多张原始图像,以便基于包含目标物体的原始图像进行基于神经网络的三维重建。可选地,可以对处于真实世界中的目标物体从不同拍摄角度进行拍摄,得到包含该目标物体的多张原始图像或者得到该目标物体对应的视频,从视频中提取包含该目标物体的多张原始图像。进一步可选的,为了能够准确重建出目标物体的三维模型,进而提高基于三维模型的图像渲染质量,可以采用绕目标物体360度的环绕方式进行拍摄,得到目标物体的多张原始图像。需要说明的是,不同原始图像对应不同的相机位姿,相机位姿包括拍摄设备在拍摄图像时的位置和姿态。其中,本实施例对拍摄设备不做限制,拍摄设备例如可以是但不限于:相机、具有拍摄功能的手机、平板电脑、可穿戴设备等。
在本实施例中,将真实的拍摄设备在对处于真实世界中的目标物体进行拍摄时,从真实的拍摄设备的相机光心发射出去穿过物体空间的视线称作为第一视线,该第一视线可以认为是真实的拍摄设备发射出的实际视线,一条第一视线从拍摄设备的相机光心发射出来穿过所拍摄图像的各个像素点对应的物体空间。以图2为例,拍摄椅子图像I1的相机1和拍摄椅子图像I2的相机2是真实相机,从真实相机的光心发射出的视线(图2中的实线)是第一视线,也即视线r1和视线r2均是第一视线。在图2中,拍摄椅子图像I3的相机3是假设出来的虚拟相机(图2中虚线框内的相机),从虚拟相机的光心发射发出的视线(图2中带箭头的虚线)是虚拟视线,也即视线r3是虚拟视线。
需要说明的是,对于一张原始图像上的每个像素点都会对应一条第一视线,相应地,样本图像中的像素点是由第一视线射到目标物体的一个表面点上成像得到的,该第一视线也就是拍摄到该像素点的视线。由此可知,目标物体上的表面点与像素点以及拍摄到该像素点的第一视线之间存在对应关系。每张原始图像中的不同像素点与目标物体上的不同表面点对应,不同表面点对应不同的第一视线,也就是说,每张原始图像中的各像素点都会与穿过目标物体上与其对应的表面点的第一视线对应,不同像素点会与穿过不同表面点的第一视线对应。另外,因为不同样本图像对应的相机位姿不同,所以不同样本图像中的像素点可能对应目标物体上不同的表面点。对两张样本图像而言,其中可能有部分像素点对应相同的表面点,也可能所有像素点均对应不同的表面点。
在本实施例中,首先,利用多张原始图像进行基于神经网络的三维重建,得到初始隐式3D表征模型。初始隐式3D表征模型能够对目标物体进行隐式三维表达,例如可以表达目标物体的形状、纹理、材质等多个维度的物体信息。在本实施例中,初始隐式3D表征模型是一个全连接神经网络,全连接神经网络又称多层感知器((Multi-Layer Perceptron,MLP)。该初始隐式3D表征模型基于输入的空间点的空间坐标和视角信息,分别预测空间点的体积密度和颜色信息。其中,初始隐式3D表征模型可以表达为:
σ,c=F(d,x)……(1)
其中,x=(x,y,z),x记为空间点的空间坐标(x,y,z);d=(θ,φ),d=(θ,φ)记为空间点的视角信息(θ,φ),θ为方位角,φ为仰角。c=(r,g,b),c记为空间点的颜色信息(r,g,b),r是指红色(Red,R),g是指绿色(Green,G),b是指蓝色(Blue,B)。σ记为空间点的体积密度。
实际应用中,初始隐式3D表征模型包括用于预测σ体积密度的Fσ网络和用于预测c颜色信息的Fc网络。于是,初始隐式3D表征模型可以进一步表达为:
Fσ:x→(σ,f)……(2)
Fc:(d,f)→c……(3)
值得注意的是,Fσ网络输入的是空间点的空间坐标x,输出的是空间点的体积密度和中间特征f。Fc网络输入的是中间特征f和空间点的视角信息d,输入的是空间点的颜色信息RGB值。也就是说,体积密度只和空间坐标x有关,颜色信息RGB值和空间坐标及视角信息相关。
在本实施例中,在获取到目标物体的多张原始图像之后,分别计算每张原始图像对应的相机位姿,根据每张原始图像对应的相机位姿和相机内参等数据确定相机在拍摄每张原始图像时发射出来的多条第一视线以及每条第一视线的视角信息。在每条第一视线上进行采样,得到多个空间点。应理解,从同一条第一视线上采样得到的空间点的视角信息均是该第一视线的视角信息。例如,图3中视线r1的四个圆点是在视线r1上采样的4个空间点,视线r1的箭头所指方向是视线r1的视角信息,也是在视线r1上采样的4个空间点的 视角信息。在得到多个空间点之后,利用多个空间点的空间坐标及其视角信息进行基于神经网络的三维重建,该过程可以是分批多次执行的过程,最终可得到初始隐式3D表征模型。需要说明的是,该分分批多次执行的三维重建过程可以是模型训练过程,但不限于此。具体地,可以采用不断迭代的方式进行基于神经网络的三维重建,例如每次可以随机选择k张原始图像,从k张原始图像中随机选择大小为m*n的图像块,利用k个图像块中各像素点对应的第一视线上空间点的空间坐标和视角信息进行基于神经网络的三维重建(或模型训练),直到三维重建过程的损失函数符合设定要求时终止三维重建过程。其中,k是大于或等于1的自然数,且k小于或等于原始图像的总数;m、n是大于或等于1的自然数,m、n分别表示图像块在横向和纵向维度上的像素数,m小于或等于原始图像的宽度(宽度维度对应横向),n小于或等于原始图像的长度(长度维度对应纵向),m和n可以相同,也可以不同。可选地,可以采用等间隔方式在每条第一视线上采样多个空间点,即任意两个相邻空间点之间的采样间隔是相同的。也可以采用不同采样间隔在每条第一视线上采样多个空间点,采样间隔的大小不做限定。
进一步可选的,可以采用SLAM(simultaneous localization and mapping,即时定位与地图构建)算法来更加准确计算每张原始图像对应的相机位姿。具体的,SLAM算法在计算相机位姿时,首先提取每张原始图像的特征点,接着,建立相邻两张原始图像的特征点之间的匹配关系,根据相邻两张原始图像的特征点之间的匹配关系计算相邻两张原始图像之间的相对相机位姿。根据两两原始图像之间的相对相机位姿计算每张原始图像对应的相机位姿。
在本实施例中,在得到对目标物体进行隐式三维表达的初始隐式3D表征模型之后,根据初始隐式3D表征模型和多张原始图像,可以构建目标物体对应的显式三维模型。
在本实施例中,显式三维模型可以是能够反映目标物体的表面特征且能够对目标物体进行显式三维表示的Mesh(网格)模型,该显式三维模型包括目标物体上的表面点及每个表面点的空间坐标和颜色信息。这些表面点可形成显式三维模型中的三角面和顶点,显式三维模型具体包括多个三角面和顶点,顶点的属性信息包括顶点的空间坐标、颜色信息、材质信息以及其它纹理信息等。顶点是表面点,每个三角面也包括多个表面点,其中,三角面上除作为顶点的表面点之外的其它表面点的空间坐标和颜色信息可由其所属三角面上的三个顶点的空间坐标和颜色信息进行插值计算得到。
在本实施例中,显式三维模型上每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的,表示该表面点对应的任何视线对应的平均视角信息。换而言之,显式三维模型上每个表面点的颜色信息并不是目标物体在光线照射下产生的真实颜色信息,而是与该表面点对应的各条第一视线的平均视角信息具有映射关系的颜色信息。
在一可选实现方式中,根据初始隐式3D表征模型和多张原始图像,构建目标物体对应的显式三维模型,包括:根据多张原始图像的图像特征,确定目标物体对应的空间范围;基于空间范围和初始隐式3D表征模型生成目标物体对应的初始三维模型, 初始三维模型包括目标物体上的表面点;针对任一表面点,将该表面点对应的至少一条第一视线的视角信息的平均值转换为该表面点的颜色信息,以得到显式三维模型。
在本实施例中,可以采用诸如运动恢复结构(Structure from Motion,SfM)算法处理多张原始图像的图像特征,以估计出目标物体对应的稀疏3D点位置,目标物体对应的稀疏3D点位置可以帮助确定目标物体在世界坐标系中的空间范围。该空间范围可以是具有长、宽和高的空间范围,例如可以是正方体空间或长方体空间,但不限于此。
进一步可选的,上述基于空间范围和初始隐式3D表征模型生成目标物体对应的初始三维模型的一种实施方式是:基于空间范围和初始隐式3D表征模型生成目标物体对应的标量场数据,标量场数据包括多个体积元素(Volume Pixel),可简称为体素;对多个体积元素进行三角面解析,得到初始三维模型包含的多个三角面、多个三角面上的多个顶点及其空间坐标,多个三角面和多个顶点用于限定初始三维模型包含的各表面点。
进一步可选的,上述空间范围为具有长宽高的长方体空间,则上述基于空间范围和初始隐式3D表征模型生成目标物体对应的标量场数据的一种实施方式是:对长方体空间在长宽高三个维度上分别进行等间隔采样得到多个目标空间点,其中,相邻8个目标空间点形成一个体积元素;将多个目标空间点的空间坐标输入初始隐式3D表征模型,得到多个目标空间点的体积密度;体积元素和体积元素包含的目标空间点的体积密度形成标量场数据。
具体而言,在目标物体对应的空间范围内在长宽高三个维度上分别按照等间隔采样方式进行空间点采样,得到多个目标空间点;多个目标空间点可形成多个小立方体,其中一个小立方体即为一个体积元素;针对每个小立方体,将该小立方体上的空间点的空间坐标输入初始隐式3D表征模型中,得到这些目标空间点的体积密度,体积元素和体积元素包含的目标空间点的体积密度构成标量场数据;基于体积元素包含的目标空间点的体积密度,利用Marching cube(移动立方体)算法对体积元素进行三角面解析,得到初始三维模型包含的三角面、三角面上的顶点及其空间坐标,其中,三角面包括多个表面点,顶点也是表面点。根据三角面和顶点可以确定初始三维模型包含的各表面点。其中,Marching Cube算法会逐个处理三维标量场中的体素(也即体积元素),分离出与等值面相交的体素,采用插值计算出等值面与立方体边的交点;根据立方体每一顶点与等值面的相对位置,将等值面与立方体边的交点按一定方式连接生成三角面,作为等值面在该立方体内的一个逼近表示;进而,在得到所有三角面之后,这些三角面相互衔接可形成目标物体对应的初始三维模型。需要说明的是,上述等间隔采样是指在同一维度上进行等间隔采样,即在长宽高中任一维度上进行空间点采样使用的采样间隔相同,但是,在不同维度上的采样间隔可以不同,当然也可以相同。例如,在该空间范围为长方体的情况下,在长这一维度上采样间隔为1,在宽这一维度上的采样间隔为0.5,在高这一维度上的采样间隔为0.8,以保证在三个维度上采样 出相同数量的目标空间点。又例如,在空间范围为正方体的情况下,长宽高三个维度上的采样间隔可以均为1,以保证在三个维度上采样出相同数量的目标空间点。
在本实施例中,在得到初始三维模型之后,针对初始三维模型上每个表面点,根据该表面点对应的至少一条第一视线的视角信息确定该表面点的颜色信息。在确定出初始三维模型上每个表面点的颜色信息后,将已经确定出各个表面点的颜色信息的初始三维模型称作为显式三维模型。其中,表面点的颜色信息可采用采用下述方式确定的:
针对任一表面点,从不同相机位姿对应的第一视线中,确定该表面点对应的至少一条第一视线,需要说明的是,同一表面点在同一相机位姿下只会有一条第一视线对应该表面点,但是,在采用不同相机位姿拍摄多张原始图像过程中,同一表面点通常会被两个或两个以上的相机位姿拍摄到,也就是说通常会有两条或两条以上来自不同相机位姿下的第一视线对应同一表面点,但是也会存在特殊情况,即某个表面点仅在一个相机位姿下被拍摄到,即只有一条第一视线对应该表面点。进一步,计算该表面点对应的至少一条第一视线的视角信息的平均值,将该平均值转换为该表面点的颜色信息进行保存。
进一步可选的,为了便于快速获取表面点对应的第一视线的视角信息,还可以生成每张原始图像对应的视角预存图,所述视角预存图中存储有该张原始图像中各像素点对应的第一视线的视角信息。值得注意的是,基于拍摄原始图像的相机位姿和相机内参,不难确定从拍摄原始图像时的光心位置出射并穿过原始图像的像素点对应的表面点的第一视线的直线方程信息,基于第一视线的直线方程信息根据几何原理可以快速获知第一视线的视角信息。
假设图像记为I,其对应的视角预存图记为R(I)。每张图像I与其视角预存图R(I)的图像尺寸大小相同,图像I与其视角预存图R(I)中的像素点具有一一对应关系,视角预存图R(I)中记录的是图像I中各像素点对应的第一视线的视角信息。应理解,第一视线从拍摄图像I时的相机光心位置出射并穿过图像I的像素点对应的目标物体上的表面点。为了便于理解,以图4为例进行说明,图4示出两张图像仅仅是示例性说明,将多张图像中的第i张图像记为Ii,图像Ii对应的视角预存图记为R(Ii),R(Ii)中记录的是图像Ii中各像素点对应的第一视线的视角信息。将多张图像中的第j张图像记为Ij,图像Ij对应的视角预存图记为R(Ij),R(Ij)中记录的是图像Ij中各像素点对应的第一视线的视角信息,其中,i,j为正整数,
相应地,针对任一表面点,将该表面点对应的至少一条第一视线的视角信息的平均值转换为表面点的颜色信息,以得到显式三维模型,包括:针对任一表面点,根据多张原始图像对应的相机位姿,结合初始三维模型,从多张原始图像中确定包含该表面点对应的目标像素点的至少一张目标原始图像;将至少一张目标原始图像对应的视角预存图中存储的该目标像素点对应的第一视线的视角信息的平均值转换为该表面点的颜色信息。
具体而言,多张原始图像对应不同的相机位姿,不同相机位姿对应不同的视角范 围,落在视角范围内的任一表面点的图像数据可被采集到,进而在采集到的原始图像中包括与该表面点对应的目标像素点。为了便于理解,针对任一表面点,将该表面点对应的像素点称为目标像素点,并将多张原始图像中包含该表面点对应的目标像素点的原始图像称作目标原始图像;针对任一原始图像,基于原始图像的相机位姿和相机内参可以确定该相机位姿对应的视角范围。从初始三维模型获取任一表面点的空间坐标,若任一表面点的空间坐标落在相机位姿对应的视角范围内,则该相机位姿下拍摄到的原始图像为任一表面点对应的目标原始图像。若任一表面点的空间坐标未落在相机位姿对应的视角范围内,则该相机位姿下拍摄到的原始图像不是任一表面点对应的目标原始图像。
对任一表面点,在确定包含该表面点对应的目标像素点的至少一张目标原始图像之后,根据目标像素点在各张目标原始图像中的图像位置,查询各张目标原始图像对应的视角预存图对应图像位置上记录的第一视线的视角信息,获取目标像素点对应的第一视线的视角信息,并对这些目标像素点对应的第一视线的视角信息进行求平均值,得到该表面点对应的平均视角信息,以及采用视角信息与颜色信息的映射关系将该表面点对应的平均视角信息转化为该表面点的颜色信息。
进一步可选的,为了更加准确地获取目标物体上的每个表面点的平均视角信息,针对任一表面点V,确定包括表面点V的多张目标原始图像,依次将表面点V在目标原始图像中的图像坐标和目标原始图像中的目标像素点对应的第一视线的视角信息进行相乘,得到多个乘积,基于多个乘积得到表面点V对应的平均视角信息进一步,参见下述公式(4),可以对多个乘积进行求平均得到表面点V对应的平均视角信息
作为一种示例,针对任一表面点V,可以按照公式(4)计算表面点V对应的平均视角信息
其中,VUV(Ii)可以按照公式(5)计算:
其中,VUV(Ii)是表面点V在图像Ii中的图像坐标,在计算VUV(Ii)的公式中,V带入的是表面点V在世界坐标系中的空间坐标(x,y,z),K是已知的相机内参,Z是V的深度信息。TW2C(Ii)表示的是图像Ii对应的相机坐标系与世界坐标系的变换矩阵。应理解,不同的图像的相机位姿不同,故不同的图像对应的相机坐标系也不同。
值得注意的是,L是指拍摄到表面点V的原始图像的数量。例如,拍摄目标物体得到的20张原始图像,其中,有5张原始图像包括表面点V,则L的取值为5。
在本实施例中,在得到目标物体的初始隐式3D表征模型和显式三维模型之后, 还可以随机生成显式三维模型上各表面点对应的不同于第一视线的虚拟视线,为了便于理解,将随机生成的虚拟视线称作为第二视线,应理解,相对于真实相机发射出的第一视线来说,第二视线是假设的虚拟相机发射出的虚拟视线。可选地,针对显式三维模型任一表面点,可以随机生成该表面点对应的第二视线,并根据该表面点的颜色信息生成该表面点对应的第二视线对应的平均视角信息。
在本实施例中,针对显式三维模型上任一表面点,可以以该表面点对应的第一视线为参考视线,在该参考视线一定范围内随机生成该表面点对应的第二视线。值得注意的是,若该表面点出现在不同相机位姿下的多张原始图像中,可以针对每个相机位姿下的该表面点均随机生成其对应的第二视线。简单来说,对任一表面点,可以根据该表面点对应的第一视线随机生成该表面点对应的第二视线。
进一步可选的,根据该表面点对应的第一视线随机生成该表面点对应的第二视线包括:根据该表面点的空间坐标和该表面点对应的第一视线的视角信息,随机生成一条经过该表面点且不同于该表面点对应的第一视线的视线作为第二视线。
具体而言,根据该表面点的空间坐标和该目标像素点对应的第一视线的视角信息,确定候选空间范围;在该候选空间范围中,随机生成一条经过该表面点且不同于该目标像素点对应的第一视线的视线作为第二视线。其中,候选空间范围可以是任意形状的空间范围。可选的,候选空间范围是以表面点的空间坐标为圆点,以穿过目标像素点对应的第一视线为中心线的椎体空间范围。在确定候选空间范围时,可以是第二视线与穿过表面点的第一视线之间的夹角范围为[-η,η]度。其中,η例如为30度。
以图5为例,图5中的圆锥体以OV为中心线,以椅子的表面点5为圆锥圆点。O是发射第一视线的真实相机的光心位置,O′是发射第二视线的虚拟相机的光心位置,OV是第一视线,O′V是随机生成的第二视线,在圆锥体内所有O′V视线(图4中浅颜色的带箭头的射线)与OV之间的夹角范围为[-30,30]度。
进一步可选的,可以预先生成每张原始图像对应的深度预存图,以便基于深度预存图快速获取表面点的空间坐标,进而提高随机生成第二视线的效率。其中,每张原始图像对应的深度预存图中存储有该张原始图像中各像素点对应表面点的深度信息。基于此,针对任一表面点,根据该表面点对应的第一视线随机生成该表面点对应的第二视线的一种可选实现方式为:针对任一表面点,根据该多张原始图像对应的相机位姿,结合该显式三维模型,从该多张原始图像中确定包含该表面点对应的目标像素点的至少一张目标原始图像;针对每张目标原始图像,根据该目标原始图像对应的深度 预存图中存储的该目标像素点对应表面点的深度信息,计算该表面点的空间坐标,根据该表面点的空间坐标和该目标像素点对应的第一视线的视角信息,随机生成一条经过该表面点且不同于该目标像素点对应的第一视线的视线作为第二视线。
关于从多张原始图像中选择任一表面点对应的至少一张目标原始图像的方式可以参见前述内容,在此不再赘述。需要说明的是,在上述过程中,可以再次执行从多张原始图像中选择任一表面点的至少一张目标原始图像的操作,也可以不再执行,而是在上文执行该操作时记录表面点与目标原始图像之间的对应关系,基于该对应关系直接获取任一表面点对应的至少一张目标原始图像。
在从深度预存图得到表面点的深度信息之后,基于穿过表面点的直线方程可以获取表面点的空间坐标。以图3为例,假设第一视线为视线r1,视线r1击中椅子上的表面点V,表面点V到光心位置O之间的距离(深度信息)记为tz,将tz带入直线方程r=O+td中,在已知光心位置O的空间坐标和视线r1的视角信息的情况下,可以计算出表面点V的空间坐标。
下面介绍几种可选的深度预存图生成方式。
方式1:针对每张原始图像中的任一像素点,针对该像素点对应的第一视线上的任一空间点,根据空间点之间的采样间距、该空间点的体积密度、深度信息以及该空间点之前其它空间点的体积密度,计算该空间点到该像素点对应的第一视线对应的相机光心的深度信息;对该像素点对应的第一视线上多个空间点到相机光心的深度信息进行加权平均,得到该像素点对应表面点到相机光心的深度信息;根据每张原始图像中各像素点对应表面点到相机光心的深度信息,生成每张原始图像对应的深度预存图。
值得注意的是,可以在获取到初始隐式3D表征模型之后开始采用方式1生成深度预存图,或者在构建显式三维模型之前或之后采用方式1生成深度预存图,本实施例对此不做限制。
具体而言,假设第一视线的直线方程记为r=O+td。O是第一视线对应的光心位置,d是第一视线的视角信息,t是第一视线上的某个空间点的深度信息,t反映的是第一视线上的某个空间点与光心位置O之间距离。在第一视线上采样N个目标空间点,N为大于1的正整数,针对第i个目标空间点,i为1至N之间的正整数,记第i个目标空间点对应的采样间距为δi、记第i个目标空间点对应的体积密度为σi、记第i个目标空间点对应的深度信息为ti、记前i-1个目标空间点的累加体积密度为Ti,记第一视线穿过的表面点到相机光心的深度信息记为tz,tz可以按照公式(6)计算:
其中,δi=ti+1-ti,ti可以通过第i个目标空间点的空间坐标和光心位置O的空间坐标之差得到。
方式2:
针对每张原始图像,利用该张原始图像对应的相机位姿对该显式三维模型进行光栅化渲染,得到该张原始图像中各像素点对应表面点到相机光心的深度信息;根据该张原始图像中各像素点对应表面点到相机光心的深度信息,生成该张原始图像对应的深度预存图。值得注意的是,在获取到显式三维模型之后再开始采用方式2生成深度预存图。
在本实施例中,采用上述实施例的方法,可以针对多张原始图像中各像素点对应的表面点分别随机生成第二视线,即可得到随机产生多条第二视线,并得到多条第二视线对应的平均视角信息,进一步可以利用多条第二视线对应的平均视角信息和多条第二视线上空间点的空间坐标,继续基于初始隐式3D表征模型进行基于神经网络的三维重建(或模型训练),得到目标隐式3D表征模型。例如,图3中视线r3可以视为随机生成的第二视线,视线r3上的圆点是多个空间点。需要说明的是,可以在对初始隐式3D表征模型进行训练之前预先采用上述方式产生所有的第二视线及其对应的平均视角信息,之后再采用多轮迭代的方式,每次使用其中部分第二视线对应的平均视角信息和部分第二视线上空间点的空间坐标,继续在初始隐式3D表征模型的基础上进行三维重建(或模型训练),直到得到三维重建的损失函数符合要求的目标隐式3D表征模型为止。或者,也可以在每次迭代过程中,实时采用上述方式产生本轮迭代所需的第二视线及其对应的平均视角信息,并基于实时产生的第二视线对应的平均视角信息和实时产生的第二视线上空间点的空间坐标,继续在初始隐式3D表征模型的基础上进行三维重建(或模型训练),直到得到三维重建的损失函数符合要求的目标隐式3D表征模型为止。
值得注意的是,针对同一表面点对应的多条第二视线,多条第二视线的视角信息均相同,均为根据该表面点对应的第一视线的视角信息计算得到的平均视角信息这样,在初始隐式3D表征模型基础上继续进行三维重建的过程,针对第二视线上的任一空间点,该空间点的颜色信息可以表达为:其中,Fσ(x)表示的是用于预测σ体积密度的Fσ网络基于第二视线上的空间点的空间坐标输出该空间点对应的中间特征。也即第二视线上的任一空间点的颜色信息是基于平均视角信息 和Fσ(x)得到的。
值得注意的是,在三维重建过程中,依次利用每条第二视线对应的平均视角信息和第二视线上空间点的空间坐标在该初始隐式3D表征模型的基础上继续进行三维重建,在每次利用上一批次的第二视线对应的平均视角信息和上一批次的第二视线上空间点的空间坐标执行一次重建操作后,采用立体渲染技术,利用预测出的上一批次中各条第二视线上各个空间点的体积密度分别对各条第二视线上各个空间点的RGB颜色信息进行积分,得到上一批次中各条第二视线对应的像素点的预测RGB颜色信息;基于上一批次中各条第二视线对应的像素点的预测RGB颜色信息与各条第二视线对应的像素点的实际RGB颜色信息(这里的实际RGB颜色信息是指相应样本图像中该像素点的颜色信息)计算损失函数,若损失函数收敛,至此完成三维重建(或模型训练)过程,若损失函数未收敛,则调整模型参数,并利用下一批次第二视线对应的平均视角信息和下一批次第二视线上空间点的空间坐标继续迭代训练,直至损失函数收敛。
在此对立体渲染技术进行简单说明,针对视线r,在视线r上采样N个空间点,N为大于1的正整数,针对第i个目标空间点,i为1至N之间的正整数,记第i个目标空间点对应的采样间距为δi、记第i个目标空间点对应的体积密度为σi、记第i个目标空间点对应的深度信息为ti、记前i-1个目标空间点的累加体积密度为Ti,记视线r的颜色为也即视线r的颜色对应像素点的颜色信息,其中,δi=ti+1-ti,ti的取值范围在预设的数值区间[tn,tf]内,数值tn和数值tf与目标物体的空间范围相关,也即
目标物体的空间围在[tn,tf]内。于是,视线r的颜色可以按照公式(6)表达为:
其中,Ti可以按照公式(7)计算:
其中,j是1至i-1之间的正整数。
本申请实施例提供的三维模型重建方法,以包含目标物体的多张原始图像为基础分别进行基于神经网络的三维重建和传统的三维重建,得到初始隐式3D表征模型和显式三维模型;基于显式三维模型进行随机视线和平均视角的生成,基于随机视线和平均视角在初始隐式3D表征模型的基础上继续进行基于神经网络的三维重建,得到目标隐式3D表征模型。其中,初始隐式3D表征模型和目标隐式3D表征模型都是对目标物体进行隐式三维表 示的神经网络模型。在三维重建过程中,通过产生随机视线并以随机视线对应的平均视角信息代替其真实视角信息的方式,利用随机视线及其对应的平均视角信息增强视线数据,基于增强后的视线数据继续进行基于神经网络的三维重建,可以得到对视线具有较强鲁棒性的隐式3D表征模型,大大提升基于该隐式3D表征模型合成不同视角图像时的鲁棒性。
基于目标物体的目标隐式3D表征模型和显式三维模型可以满足用户渲染出目标物体的任意视角图像的需求。如图4所示,结合目标隐式3D表征模型(图4中未示出)和显式三维模型携带的目标物体上各个表面点的颜色信息代表的平均视角信息,可以渲染出质量更好的视角图像。为此,基于上述实施例提供的三维模型重建方法得到的目标隐式3D表征模型和显式三维模型,本申请实施例还提供一种图像生成方法。图6为本申请实施例提供的图像生成方法的流程示意图。如图6所示,该方法可以包括以下步骤:
601、根据待渲染的目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和目标视线对应的平均视角信息。
602、根据目标视线上空间点的空间坐标和目标视线对应的平均视角信息,结合目标物体对应的目标隐式3D表征模型,生成目标物体在目标相机位姿下的目标图像。
在本实施例中,显式三维模型和目标隐式3D表征模型是融入视线先验信息和平均视角信息进行基于神经网络的三维重建的过程中得到的。其中,融入视线先验信息和平均视角信息进行基于神经网络的三维重建的过程可采用上述实施例提供的三维重建方法实现,在此不再赘述。
在本实施例中,在需要渲染新视角图像时,可以获取待渲染的目标相机位姿,然后基于目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和目标视线对应的平均视角信息;在得到目标视线和目标视线对应的平均视角信息之后,结合目标物体对应的目标隐式3D表征模型,生成目标物体在所述目标相机位姿下的目标图像。
在一可选实施例中,上述基于目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和目标视线对应的平均视角信息的过程包括:基于待渲染的目标相机位姿对目标物体对应的显式三维模型的光栅化渲染结果,确定该显式三维模型上位于目标相机位姿对应视野范围内的目标表面点及其颜色信息;针对任一目标表面点,获取目标相机位姿对应的相机光心到该目标表面点的目标视线,在目标视线上进行空间点采样,获取目标视线上的空间点;并将该目标表面点的颜色信息转换为该颜色信息代表的平均视角信息,作为目标视线对应的平均视角信息,至此得到目标视线对应的平均视角信息和目标视线上空间点的空间坐标。
在一可选实施例中,上述根据目标视线上空间点的空间坐标和目标视线对应的平均视角信息,结合目标物体对应的目标隐式3D表征模型,生成目标物体在目标相机位姿下的目标图像的过程包括:将目标视线对应的平均视角信息和目标视线上空间点 的空间坐标输入目标隐式3D表征模型,得到目标视线上各个空间点的颜色信息和体积密度;采用立体渲染技术,通过每条目标视线上各个空间点的体积密度,对每条目标视线上各个空间点的颜色信息进行积分,得到每条目标视线对应的目标表面点在目标相机位姿下的颜色信息。在得到目标视线对应的目标表面点在目标相机位姿下的颜色信息之后,根据目标表面点在目标相机位姿下的颜色信息可以渲染出目标物体在目标相机位姿下的目标图像。其中,目标图像是指包含目标物体的2D图像。值得注意的是,目标表面点的数量为多个,分别对应目标图像中的一个像素点。
在一可选实施例中,可以面向用户提供一种基于神经网络的三维重建服务,该服务可以部署在服务端,服务端可以在云端,在实现形态上可以是云端服务器、虚拟机、容器等;当然,服务端也可以采用传统服务器实现,对此不做限定。该服务面向用户提供人机交互界面,该人机交互界面可以是web界面或命令窗等。用户可以通过该服务提供的人机交互界面使用该服务,例如通过该人机交互界面向服务端提交原始图像或待渲染视角图像对应的目标相机位姿,并且可通过人机交互界面展示目标物体对应的显式三维模型或渲染出的视角图像等。
在一可选实施例中,用户在其使用的终端设备上展示基于神经网络的三维重建服务对应的人机交互界面,用户通过该人机交互界面进行图像上传或图像拍摄,以提交进行三维重建所需的包含目标物体的多张原始图像。基于此,响应于人机交互界面上的图像上传操作或图像拍摄操作,获取包含目标物体的多张原始图像;之后,执行三维重建过程,即根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到初始隐式3D表征模型;根据初始隐式3D表征模型和多张原始图像,构建显式三维模型;随机生成显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;根据第二视线对应的平均视角信息和第二视线上空间点的空间坐标,基于初始隐式3D表征模型进行基于神经网络的三维重建,得到目标隐式3D表征模型。关于各步骤的详细实现可参见前述实施例,在此不再赘述。
进一步,在得到目标隐式3D表征模型之后,还可以在人机交互界面上输出已得到目标隐式3D表征模型的消息,以通知用户可以基于该目标隐式3D表征模型进行新视角图像的合成;用户在该人机交互界面上输入待渲染的目标相机位姿;响应人机交互界面上的输入操作,获取待渲染的目标相机位姿;之后,执行图像合成过程,即根据待渲染的目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和目标视线对应的平均视角信息;根据目标视线上空间点的空间坐标和目标视线对应的平均视角信息,结合目标物体对应的目标隐式3D表征模型,生成目标物体在所述目标相机位姿下的目标图像,并输出该目标图像。关于各步骤的详细实现可参见前述实施例,在此不再赘述。
本申请实施例提供的图像生成方法,结合目标隐式3D表征模型和显式三维模型 携带的目标物体上各个表面点的平均视角信息,可以渲染出质量更好的目标图像,满足了用户渲染出目标物体的任意视角图像的需求。
为了便于理解,下面介绍几种场景实施例对本申请实施例提供的三维模型重建方法进行详细说明。
场景实施例1:
在电商场景中,商品主图的好坏直接影响着电商店铺的客流量。目前,在制作商品主图时,通常需要利用相机从多个不同视角对商品对象进行拍摄,得到多张不同的图像,并从多张图像中选择一张质量较好的商品图像作为商品主图。然而,受限于拍摄人员的拍摄技巧,选择出的商品主图无法提供一个很好的视角展示商品信息,致使难以有效地吸引顾客点击商品链接,影响电商店铺的客流量。另外,需要拍摄大量的图像才能保证选择出质量较好的商品主图,人工成本较高,商品主图制作效率较低。
出于满足快速制作质量较好的商品主图的需求,可以利用本申请实施例提供的三维模型重建方法制作商品主图。参见图6b,实际应用中,商家可以用诸如手机、手机、平板电脑、可穿戴式智能设备、智能家居设备等终端设备环绕商品对象360度的环绕方式拍摄一段视频,商家可在终端设备提供的人机交互界面(例如为web界面)上发起图片上传操作,以将该包括多种商品图像的视频上传至执行三维模型重建方法的服务端,如图6b中①所示。该服务端为单个服务器或多个服务器组成的分布式服务器集群,进一步可选的,服务端可以为云端服务器。如图6b中②所示,服务端基于多张商品图像进行三维模型重建以获得对商品对象进行三维3D表达的目标隐式3D表征模型和商品对象的显示三维模型。在三维模型重建完毕后,商家可以在终端设备提供的人机交互界面上输入渲染视角,终端设备解析渲染视角获取对应的待渲染的相机位姿,并生成包括待渲染的相机位姿的新视角图像获取请求并向服务端发送该新视角图像获取请求,如图6b中③所示,服务端响应新视角图像获取请求,如图6b中④和⑤所示,基于目标隐式3D表征模型和显示三维模型生成在待渲染的相机位姿下的商品对象的新视角图像,并向终端设备发送商品对象的新视角图像以供终端设备展示该新视角图像。商家可在其终端设备上查看到商品对象的新视角图像。
场景实施例2:
在AI(Artificial Intelligence,人工智能)家装领域,为了给消费者提供3D场景化的购物体验,将查看商品从传统的看图片、看视频升级成在AI家居场景中看搭配、看效果。在创建AI家居场景过程中,除了需要创建三维立体户型图,还需要创建搭配到三维立体户型图中的家具、电器等三维模型。为此,设计人员可以用手机环绕真实场景中的家具、电器等物体360度的环绕方式拍摄一段视频,并将该视频上传至执行三维模型重建方法的三维模型重建装置,三维模型重建装置基于视频中的多张图像进行三维模型重建,获取家具、电器的三维模型,并将家具、电器的三维模型搭配到三维立体户型图中,以完成AI家居场景的创建任务。
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者, 该方法也由不同设备作为执行主体。比如,步骤101至步骤104的执行主体可以为设备A;又比如,步骤101和102的执行主体可以为设备A,步骤103和104的执行主体可以为设备B;等等。
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
图7为本申请实施例提供的一种三维模型重建装置的结构示意图。如图7所示,该装置可以包括:重建模块71、构建模块72和生成模块73;
其中,重建模块71,用于根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到对目标物体进行隐式三维3D表达的初始隐式3D表征模型,所述多张原始图像对应不同的相机位姿,且每张原始图像中的不同像素点与穿过目标物体上不同表面点的第一视线对应,或者说是,目标物体上的表面点与对应原始图像中的像素点对应,且与拍摄到像素点的第一视线对应。
构建模块72,用于根据所述初始隐式3D表征模型和所述多张原始图像,构建目标物体对应的显式三维模型,该显式三维模型包括目标物体上表面点的颜色信息,每个表面点的颜色信息是该表面点对应的至少一条第一视线的平均视角信息确定的。
生成模块73,用于随机生成显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;
重建模块71,还用于根据所述第二视线对应的平均视角信息和所述第二视线上空间点的空间坐标,基于初始隐式3D表征模型进行基于神经网络的三维重建,得到对目标物体进行隐式三维3D表达的目标隐式3D表征模型。
进一步可选的,构建模块72根据初始隐式3D表征模型和多张原始图像,构建目标物体对应的显式三维模型时,具体用于:根据多张原始图像的图像特征,确定所述目标物体对应的空间范围;基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的初始三维模型,所述初始三维模型包括所述目标物体上的表面点;将所述初始三维模型上每个表面点对应的第一视线的视角信息的平均值,分别转换为每个表面点的颜色信息,以得到所述显式三维模型。
进一步可选的,构建模块72基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的初始三维模型时,具体用于:基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的标量场数据,所述标量场数据包括多个体积元素;对所述多个体积元素进行三角面解析,得到初始三维模型包含的多个三角面、所述多个三角面上的多个顶点及其空间坐标,所述多个三角面和多个顶点用于限定所述初始 三维模型包含的各表面点。
进一步可选的,所述空间范围为具有长宽高的长方体空间,构建模块72基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的标量场数据时,具体用于:对所述长方体空间在长宽高三个维度上进行等间隔采样得到多个目标空间点,其中,相邻8个目标空间点形成一个体积元素;将所述多个目标空间点的空间坐标输入所述初始隐式3D表征模型,得到所述多个目标空间点的体积密度;所述体积元素和所述体积元素包含的目标空间点的体积密度形成所述标量场数据。
进一步可选的,生成模块73还用于:生成每张原始图像对应的视角预存图,所述视角预存图中存储有该张原始图像中各像素点对应的第一视线的视角信息;
相应地,构建模块72将所述初始三维模型上每个表面点对应的第一视线的视角信息的平均值,分别转换为每个表面点的颜色信息,以得到所述显式三维模型时,具体用于:针对任一表面点,根据所述多张原始图像对应的相机位姿,结合所述初始三维模型,从所述多张原始图像中确定包含所述表面点对应的目标像素点的至少一张目标原始图像;将所述至少一张目标原始图像对应的视角预存图中存储的所述目标像素点对应的第一视线的视角信息的平均值转换为所述表面点的颜色信息。
进一步可选的,生成模块73还用于:生成每张原始图像对应的深度预存图,所述深度预存图中存储有该张原始图像中各像素点对应表面点的深度信息;
相应地,生成模块73随机生成所述显式三维模型上表面点对应的第二视线时,具体用于:针对任一表面点,根据所述多张原始图像对应的相机位姿,结合所述显式三维模型,从所述多张原始图像中确定包含所述表面点对应的目标像素点的至少一张目标原始图像;针对每张目标原始图像,根据所述目标原始图像对应的深度预存图中存储的所述目标像素点对应表面点的深度信息,计算所述表面点的空间坐标,根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线。
进一步可选的,生成模块73根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线时,具体用于:根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,确定候选空间范围;在所述候选空间范围中,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线。
进一步可选的,所述候选空间范围是以所述表面点的空间坐标为圆点,以所述目标像素点对应的第一视线为中心线的椎体空间范围。
进一步可选的,生成模块73生成每张原始图像对应的深度预存图时,具体用于:针对每张原始图像中的任一像素点,针对所述像素点对应的第一视线上的任一空间点,根据空间点之间的采样间距、所述空间点的体积密度、深度信息以及所述空间点之前 其它空间点的体积密度,计算所述空间点到所述像素点对应的第一视线对应的相机光心的深度信息;对所述像素点对应的第一视线上多个空间点到相机光心的深度信息进行加权平均,得到所述像素点对应表面点到相机光心的深度信息;根据每张原始图像中各像素点对应表面点到相机光心的深度信息,生成每张原始图像对应的深度预存图;或者,针对每张原始图像,利用该张原始图像对应的相机位姿对所述显式三维模型进行光栅化渲染,得到该张原始图像中各像素点对应表面点到相机光心的深度信息;根据该张原始图像中各像素点对应表面点到相机光心的深度信息,生成该张原始图像对应的深度预存图。
进一步可选的,上述装置还包括:确定模块和渲染模块;
确定模块,用于根据待渲染的目标相机位姿和所述显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息;
渲染模块,用于根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像。
在一可选实施例中,确定模块具体用于:根据所述目标相机位姿对所述显式三维模型进行光栅化渲染,得到位于所述目标相机位姿对应视野范围内的目标表面点及其颜色信息;针对任一目标表面点,获取所述目标相机位姿对应的相机光心到所述目标表面点的目标视线,并根据所述目标表面点的颜色信息生成所述目标视线对应的平均视角信息。
在一可选实施例中,渲染模块具体用于:将所述目标视线对应的平均视角信息和所述目标视线上空间点的空间坐标输入所述目标隐式3D表征模型,得到所述目标视线上空间点的颜色信息和体积密度;根据所述目标视线上空间点的颜色信息和体积密度进行体渲染,以得到所述目标物体在所述目标相机位姿下的目标图像。
关于图7所示的装置其中各个模块、单元执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图8为本申请实施例提供的一种图像生成装置的结构示意图。如图8所示,该装置可以包括:确定模块82和渲染模块83;
确定模块82,用于根据待渲染的目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息;
渲染模块83,用于根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标物体对应的目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像;其中,所述显式三维模型和目标隐式3D表征模型是融入视线先验信息和平均视角信息进行基于神经网络的三维重建得到的。
在一可选实施例中,确定模块具体用于:根据所述目标相机位姿对所述显式三维模型进行光栅化渲染,得到位于所述目标相机位姿对应视野范围内的目标表面点及其 颜色信息;针对任一目标表面点,获取所述目标相机位姿对应的相机光心到所述目标表面点的目标视线,并根据所述目标表面点的颜色信息生成所述目标视线对应的平均视角信息。
在一可选实施例中,渲染模块具体用于:将所述目标视线对应的平均视角信息和所述目标视线上空间点的空间坐标输入所述目标隐式3D表征模型,得到所述目标视线上空间点的颜色信息和体积密度;根据所述目标视线上空间点的颜色信息和体积密度进行体渲染,以得到所述目标物体在所述目标相机位姿下的目标图像。
进一步可选的,上述装置还包括:重建模块、构建模块和生成模块;
重建模块,用于根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到对所述目标物体进行隐式三维3D表达的初始隐式3D表征模型,所述目标物体上的表面点与对应原始图像中的像素点对应,且与拍摄到所述像素点的第一视线对应。另外,所述多张原始图像对应不同的相机位姿,且每张原始图像中的不同像素点与穿过所述目标物体上不同表面点的第一视线对应;
构建模块,用于根据所述初始隐式3D表征模型和所述多张原始图像,构建所述目标物体对应的显式三维模型,所述显式三维模型包括所述目标物体上表面点的颜色信息,每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的;
生成模块,用于随机生成所述显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;
重建模块,还用于根据所述第二视线对应的平均视角信息和所述第二视线上空间点的空间坐标,基于所述初始隐式3D表征模型进行基于神经网络的三维重建,得到对所述目标物体进行隐式三维3D表达的目标隐式3D表征模型。
关于图8所示的装置其中各个模块、单元执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图9为本申请实施例提供的一种计算机设备的结构示意图。参见图9,该计算机设备包括:存储器91和处理器92。
存储器91,用于存储计算机程序,并可被配置为存储其它各种数据以支持在计算平台上的操作。这些数据的示例包括用于在计算平台上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。
存储器91可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
处理器92,与存储器91耦合,用于执行存储器91中的计算机程序,以用于:根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到对所述目标物体进行隐式三维3D表达的初始隐式3D表征模型,所述目标物体上的表面点与对应原始图像 中的像素点对应,且与拍摄到所述像素点的第一视线对应;所述多张原始图像对应不同的相机位姿,且每张原始图像中的不同像素点与穿过所述目标物体上不同表面点的第一视线对应;根据所述初始隐式3D表征模型和所述多张原始图像,构建所述目标物体对应的显式三维模型,所述显式三维模型包括所述目标物体上表面点的颜色信息,每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的;随机生成所述显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;根据所述第二视线对应的平均视角信息和所述第二视线上空间点的空间坐标,基于所述初始隐式3D表征模型进行基于神经网络的三维重建,得到对所述目标物体进行隐式三维3D表达的目标隐式3D表征模型。
进一步可选的,处理器92根据所述初始隐式3D表征模型和所述多张原始图像,构建所述目标物体对应的显式三维模型时,具体用于:根据所述多张原始图像的图像特征,确定所述目标物体对应的空间范围;基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的初始三维模型,所述初始三维模型包括所述目标物体上的表面点;将所述初始三维模型上每个表面点对应的第一视线的视角信息的平均值,分别转换为每个表面点的颜色信息,以得到所述显式三维模型。
进一步可选的,处理器92基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的初始三维模型时,具体用于:基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的标量场数据,所述标量场数据包括多个体积元素;对所述多个体积元素进行三角面解析,得到初始三维模型包含的多个三角面、所述多个三角面上的多个顶点及其空间坐标,所述多个三角面和多个顶点用于限定所述初始三维模型包含的各表面点。
进一步可选的,所述空间范围为具有长宽高的长方体空间,处理器92基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的标量场数据时,具体用于:对所述长方体空间在长宽高三个维度上进行等间隔采样得到多个目标空间点,其中,相邻8个目标空间点形成一个体积元素;将所述多个目标空间点的空间坐标输入所述初始隐式3D表征模型,得到所述多个目标空间点的体积密度;所述体积元素和所述体积元素包含的目标空间点的体积密度形成所述标量场数据。
进一步可选的,处理器92还用于:生成每张原始图像对应的视角预存图,所述视角预存图中存储有该张原始图像中各像素点对应的第一视线的视角信息;
相应地,处理器92将所述初始三维模型上每个表面点对应的第一视线的视角信息的平均值,分别转换为每个表面点的颜色信息,以得到所述显式三维模型时,具体用于:针对任一表面点,根据所述多张原始图像对应的相机位姿,结合所述初始三维模型,从所述多张原始图像中确定包含所述表面点对应的目标像素点的至少一张目标原始图像;将所述至少一张目标原始图像对应的视角预存图中存储的所述目标像素点对 应的第一视线的视角信息的平均值转换为所述表面点的颜色信息。
进一步可选的,处理器92还用于:生成每张原始图像对应的深度预存图,所述深度预存图中存储有该张原始图像中各像素点对应表面点的深度信息;
相应地,处理器92随机生成所述显式三维模型上表面点对应的第二视线时,具体用于:针对任一表面点,根据所述多张原始图像对应的相机位姿,结合所述显式三维模型,从所述多张原始图像中确定包含所述表面点对应的目标像素点的至少一张目标原始图像;针对每张目标原始图像,根据所述目标原始图像对应的深度预存图中存储的所述目标像素点对应表面点的深度信息,计算所述表面点的空间坐标,根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线。
进一步可选的,处理器92根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线时,具体用于:根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,确定候选空间范围;在所述候选空间范围中,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线。
进一步可选的,所述候选空间范围是以所述表面点的空间坐标为圆点,以所述目标像素点对应的第一视线为中心线的椎体空间范围。
进一步可选的,处理器92生成每张原始图像对应的深度预存图时,具体用于:针对每张原始图像中的任一像素点,针对所述像素点对应的第一视线上的任一空间点,根据空间点之间的采样间距、所述空间点的体积密度、深度信息以及所述空间点之前其它空间点的体积密度,计算所述空间点到所述像素点对应的第一视线对应的相机光心的深度信息;对所述像素点对应的第一视线上多个空间点到相机光心的深度信息进行加权平均,得到所述像素点对应表面点到相机光心的深度信息;根据每张原始图像中各像素点对应表面点到相机光心的深度信息,生成每张原始图像对应的深度预存图;或者,针对每张原始图像,利用该张原始图像对应的相机位姿对所述显式三维模型进行光栅化渲染,得到该张原始图像中各像素点对应表面点到相机光心的深度信息;根据该张原始图像中各像素点对应表面点到相机光心的深度信息,生成该张原始图像对应的深度预存图。
进一步可选的,处理器92还用于:根据待渲染的目标相机位姿和所述显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息;根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像。
进一步可选地,处理器92在根据待渲染的目标相机位姿和所述显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息时,具体用于:根据所述目 标相机位姿对所述显式三维模型进行光栅化渲染,得到位于所述目标相机位姿对应视野范围内的目标表面点及其颜色信息;针对任一目标表面点,获取所述目标相机位姿对应的相机光心到所述目标表面点的目标视线,并根据所述目标表面点的颜色信息生成所述目标视线对应的平均视角信息。
进一步可选地,处理器92在生成所述目标物体在所述目标相机位姿下的目标图像时,具体用于:将所述目标视线对应的平均视角信息和所述目标视线上空间点的空间坐标输入所述目标隐式3D表征模型,得到所述目标视线上空间点的颜色信息和体积密度;根据所述目标视线上空间点的颜色信息和体积密度进行体渲染,以得到所述目标物体在所述目标相机位姿下的目标图像。
进一步,如图9所示,该计算机设备还包括:通信组件93、显示器94、电源组件95、音频组件96等其它组件。图9中仅示意性给出部分组件,并不意味着计算机设备只包括图9所示组件。另外,图9中虚线框内的组件为可选组件,而非必选组件,具体可视排产设备的产品形态而定。本实施例的计算机设备可以实现为台式电脑、笔记本电脑、智能手机或IOT设备等终端设备,也可以是常规服务器、云服务器或服务器阵列等服务端设备。若本实施例的计算机设备实现为台式电脑、笔记本电脑、智能手机等终端设备,可以包含图9中虚线框内的组件;若本实施例的计算机设备实现为常规服务器、云服务器或服务器阵列等服务端设备,则可以不包含图9中虚线框内的组件。
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被执行时能够实现上述方法实施例中可由计算机设备执行的各步骤。
相应地,本申请实施例还提供一种计算机程序产品,包括计算机程序/指令,当计算机程序/指令被处理器执行时,致使处理器能够实现上述方法实施例中可由计算机设备执行的各步骤。
上述通信组件被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
上述显示器包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。
上述电源组件,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的 组件。
上述音频组件,可被配置为输出和/或输入音频信号。例如,音频组件包括一个麦克风(MIC),当音频组件所在设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器或经由通信组件发送。在一些实施例中,音频组件还包括一个扬声器,用于输出音频信号。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储 器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带存储器或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (16)

  1. 一种三维模型重建方法,其特征在于,包括:
    根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到初始隐式3D表征模型,所述目标物体上的表面点与对应原始图像中的像素点对应,且与拍摄到所述像素点的第一视线对应;
    根据所述初始隐式3D表征模型和所述多张原始图像,构建显式三维模型,所述显式三维模型包括所述目标物体上表面点的颜色信息,每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的;
    随机生成所述显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;
    根据所述第二视线对应的平均视角信息和所述第二视线上空间点的空间坐标,基于所述初始隐式3D表征模型进行基于神经网络的三维重建,得到目标隐式3D表征模型。
  2. 根据权利要求1所述的方法,其特征在于,根据所述初始隐式3D表征模型和所述多张原始图像,构建显式三维模型,包括:
    根据所述多张原始图像的图像特征,确定所述目标物体对应的空间范围;
    基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的初始三维模型,所述初始三维模型包括所述目标物体上的表面点;
    将所述初始三维模型上每个表面点对应的第一视线的视角信息的平均值,分别转换为每个表面点的颜色信息,以得到所述显式三维模型。
  3. 根据权利要求2所述的方法,其特征在于,基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的初始三维模型,包括:
    基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的标量场数据,所述标量场数据包括多个体积元素;
    对所述多个体积元素进行三角面解析,得到初始三维模型包含的多个三角面、所述多个三角面上的多个顶点及其空间坐标,所述多个三角面和多个顶点用于限定所述初始三维模型包含的各表面点。
  4. 根据权利要求3所述的方法,其特征在于,所述空间范围为具有长宽高的长方体空间,基于所述空间范围和所述初始隐式3D表征模型生成所述目标物体对应的标量场数据,包括:
    对所述长方体空间在长宽高三个维度上进行等间隔采样得到多个目标空间点,其中,相邻8个目标空间点形成一个体积元素;
    将所述多个目标空间点的空间坐标输入所述初始隐式3D表征模型,得到所述多个目标空间点的体积密度;所述体积元素和所述体积元素包含的目标空间点的体积密度形成所述标量场数据。
  5. 根据权利要求2所述的方法,其特征在于,还包括:生成每张原始图像对应的视角预存图,所述视角预存图中存储有该张原始图像中各像素点对应的第一视线的视角信息;
    相应地,将所述初始三维模型上每个表面点对应的第一视线的视角信息的平均值,分别转换为每个表面点的颜色信息,以得到所述显式三维模型,包括:
    针对任一表面点,根据所述多张原始图像对应的相机位姿,结合所述初始三维模型,从所述多张原始图像中确定包含所述表面点对应的目标像素点的至少一张目标原始图像;
    将所述至少一张目标原始图像对应的视角预存图中存储的所述目标像素点对应的第一视线的视角信息的平均值转换为所述表面点的颜色信息。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,还包括:生成每张原始图像对应的深度预存图,所述深度预存图中存储有该张原始图像中各像素点对应表面点的深度信息;
    相应地,随机生成所述显式三维模型上表面点对应的第二视线,包括:
    针对任一表面点,根据所述多张原始图像对应的相机位姿,结合所述显式三维模型,从所述多张原始图像中确定包含所述表面点对应的目标像素点的至少一张目标原始图像;
    针对每张目标原始图像,根据所述目标原始图像对应的深度预存图中存储的所述目标像素点对应表面点的深度信息,计算所述表面点的空间坐标,根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线。
  7. 根据权利要求6所述的方法,其特征在于,根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线,包括:
    根据所述表面点的空间坐标和所述目标像素点对应的第一视线的视角信息,确定候选空间范围;
    在所述候选空间范围中,随机生成一条经过所述表面点且不同于所述目标像素点对应的第一视线的视线作为第二视线。
  8. 根据权利要求7所述的方法,其特征在于,所述候选空间范围是以所述表面点的空间坐标为圆点,以所述目标像素点对应的第一视线为中心线的椎体空间范围。
  9. 根据权利要求6所述的方法,其特征在于,生成每张原始图像对应的深度预存图,包括:
    针对每张原始图像中的任一像素点,针对所述像素点对应的第一视线上的任一空间点,根据空间点之间的采样间距、所述空间点的体积密度、深度信息以及所述空间点之前其它空间点的体积密度,计算所述空间点到所述像素点对应的第一视线对应的 相机光心的深度信息;对所述像素点对应的第一视线上多个空间点到相机光心的深度信息进行加权平均,得到所述像素点对应表面点到相机光心的深度信息;根据每张原始图像中各像素点对应表面点到相机光心的深度信息,生成每张原始图像对应的深度预存图;
    或者
    针对每张原始图像,利用该张原始图像对应的相机位姿对所述显式三维模型进行光栅化渲染,得到该张原始图像中各像素点对应表面点到相机光心的深度信息;根据该张原始图像中各像素点对应表面点到相机光心的深度信息,生成该张原始图像对应的深度预存图。
  10. 根据权利要求1-5任一项所述的方法,其特征在于,在得到所述目标隐式3D表征模型之后,所述方法还包括:
    根据待渲染的目标相机位姿和所述显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息;
    根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像。
  11. 一种图像生成方法,其特征在于,包括:
    根据待渲染的目标相机位姿和目标物体对应的显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息;
    根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标物体对应的目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像;
    其中,所述显式三维模型和目标隐式3D表征模型是融入视线先验信息和平均视角信息进行基于神经网络的三维重建得到的。
  12. 根据权利要求11所述的方法,其特征在于,根据所述目标相机位姿和所述显式三维模型,确定待渲染的目标视线和所述目标视线对应的平均视角信息,包括:
    根据所述目标相机位姿对所述显式三维模型进行光栅化渲染,得到位于所述目标相机位姿对应视野范围内的目标表面点及其颜色信息;
    针对任一目标表面点,获取所述目标相机位姿对应的相机光心到所述目标表面点的目标视线,并根据所述目标表面点的颜色信息生成所述目标视线对应的平均视角信息。
  13. 根据权利要求11或12所述的方法,其特征在于,根据所述目标视线上空间点的空间坐标和所述目标视线对应的平均视角信息,结合所述目标隐式3D表征模型,生成所述目标物体在所述目标相机位姿下的目标图像,包括:
    将所述目标视线对应的平均视角信息和所述目标视线上空间点的空间坐标输入所述目标隐式3D表征模型,得到所述目标视线上空间点的颜色信息和体积密度;
    根据所述目标视线上空间点的颜色信息和体积密度进行体渲染,以得到所述目标物体在所述目标相机位姿下的目标图像。
  14. 根据权利要求11或12所述的方法,其特征在于,还包括:
    根据包含目标物体的多张原始图像进行基于神经网络的三维重建,得到初始隐式3D表征模型,所述目标物体上的表面点与对应原始图像中的像素点对应,且与拍摄到所述像素点的第一视线对应;
    根据所述初始隐式3D表征模型和所述多张原始图像,构建显式三维模型,所述显式三维模型包括所述目标物体上表面点及其颜色信息,每个表面点的颜色信息是根据该表面点对应的第一视线的平均视角信息确定的;
    随机生成所述显式三维模型上表面点对应的第二视线,并根据每个表面点的颜色信息分别生成每个表面点对应的第二视线对应的平均视角信息;
    根据所述第二视线对应的平均视角信息和所述第二视线上空间点的空间坐标,基于所述初始隐式3D表征模型进行基于神经网络的三维重建,得到目标隐式3D表征模型。
  15. 一种计算机设备,其特征在于,包括:存储器和处理器;所述存储器,用于存储计算机程序;所述处理器耦合至所述存储器,用于执行所述计算机程序以用于执行权利要求1-14任一项所述方法中的步骤。
  16. 一种存储有计算机程序的计算机存储介质,其特征在于,当所述计算机程序被处理器执行时,致使所述处理器能够实现权利要求1-14任一项所述方法中的步骤。
PCT/CN2023/071960 2022-01-24 2023-01-12 三维模型重建与图像生成方法、设备以及存储介质 WO2023138477A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210081291.6 2022-01-24
CN202210081291.6A CN114119839B (zh) 2022-01-24 2022-01-24 三维模型重建与图像生成方法、设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2023138477A1 true WO2023138477A1 (zh) 2023-07-27

Family

ID=80361256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071960 WO2023138477A1 (zh) 2022-01-24 2023-01-12 三维模型重建与图像生成方法、设备以及存储介质

Country Status (2)

Country Link
CN (1) CN114119839B (zh)
WO (1) WO2023138477A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315148A (zh) * 2023-09-26 2023-12-29 北京智象未来科技有限公司 三维物体风格化方法、装置、设备、存储介质
CN118131643A (zh) * 2024-05-08 2024-06-04 四川职业技术学院 一种基于物联网的智能家居系统、方法
CN118247418A (zh) * 2024-05-28 2024-06-25 长春师范大学 一种利用少量模糊图像重建神经辐射场的方法
CN118552705A (zh) * 2024-07-29 2024-08-27 杭州倚澜科技有限公司 一种商品渲染展示方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119839B (zh) * 2022-01-24 2022-07-01 阿里巴巴(中国)有限公司 三维模型重建与图像生成方法、设备以及存储介质
CN114841783A (zh) * 2022-05-27 2022-08-02 阿里巴巴(中国)有限公司 商品信息处理方法、装置、终端设备及存储介质
CN114758081A (zh) * 2022-06-15 2022-07-15 之江实验室 基于神经辐射场的行人重识别三维数据集构建方法和装置
CN114863037B (zh) * 2022-07-06 2022-10-11 杭州像衍科技有限公司 基于单手机的人体三维建模数据采集与重建方法及系统
CN115100360B (zh) * 2022-07-28 2023-12-01 中国电信股份有限公司 图像生成方法及装置、存储介质和电子设备
CN115272575B (zh) * 2022-07-28 2024-03-29 中国电信股份有限公司 图像生成方法及装置、存储介质和电子设备
CN115243025B (zh) * 2022-09-21 2023-01-24 深圳市明源云科技有限公司 三维渲染方法、装置、终端设备以及存储介质
CN115937907B (zh) * 2023-03-15 2023-05-30 深圳市亲邻科技有限公司 社区宠物识别方法、装置、介质及设备
CN116129030B (zh) * 2023-04-18 2023-07-04 湖南马栏山视频先进技术研究院有限公司 一种基于神经辐射场的场景物体融合方法及装置
CN116612256B (zh) * 2023-04-19 2024-05-14 深圳市兰星科技有限公司 一种基于NeRF的实时远程三维实景模型浏览方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628348A (zh) * 2021-08-02 2021-11-09 聚好看科技股份有限公司 一种确定三维场景中视点路径的方法及设备
CN113706714A (zh) * 2021-09-03 2021-11-26 中科计算技术创新研究院 基于深度图像和神经辐射场的新视角合成方法
US20210390761A1 (en) * 2020-06-15 2021-12-16 Microsoft Technology Licensing, Llc Computing images of dynamic scenes
CN114119839A (zh) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 三维模型重建与图像生成方法、设备以及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6807455B2 (ja) * 2017-06-08 2021-01-06 株式会社ソニー・インタラクティブエンタテインメント 情報処理装置および画像生成方法
CN107862733B (zh) * 2017-11-02 2021-10-26 南京大学 基于视线更新算法的大规模场景实时三维重建方法和系统
CN108805979B (zh) * 2018-02-05 2021-06-29 清华-伯克利深圳学院筹备办公室 一种动态模型三维重建方法、装置、设备和存储介质
CN109360268B (zh) * 2018-09-29 2020-04-24 清华大学 重建动态物体的表面优化方法及装置
CN110998671B (zh) * 2019-11-22 2024-04-02 驭势科技(浙江)有限公司 三维重建方法、装置、系统和存储介质
WO2021120175A1 (zh) * 2019-12-20 2021-06-24 驭势科技(南京)有限公司 三维重建方法、装置、系统和存储介质
US11967015B2 (en) * 2020-02-06 2024-04-23 Apple Inc. Neural rendering
CN113099208B (zh) * 2021-03-31 2022-07-29 清华大学 基于神经辐射场的动态人体自由视点视频生成方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210390761A1 (en) * 2020-06-15 2021-12-16 Microsoft Technology Licensing, Llc Computing images of dynamic scenes
CN113628348A (zh) * 2021-08-02 2021-11-09 聚好看科技股份有限公司 一种确定三维场景中视点路径的方法及设备
CN113706714A (zh) * 2021-09-03 2021-11-26 中科计算技术创新研究院 基于深度图像和神经辐射场的新视角合成方法
CN114119839A (zh) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 三维模型重建与图像生成方法、设备以及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MICHAEL NIEMEYER; JONATHAN T. BARRON; BEN MILDENHALL; MEHDI S. M. SAJJADI; ANDREAS GEIGER; NOHA RADWAN: "RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 1 December 2021 (2021-12-01), 201 Olin Library Cornell University Ithaca, NY 14853, XP091112896 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315148A (zh) * 2023-09-26 2023-12-29 北京智象未来科技有限公司 三维物体风格化方法、装置、设备、存储介质
CN117315148B (zh) * 2023-09-26 2024-05-24 北京智象未来科技有限公司 三维物体风格化方法、装置、设备、存储介质
CN118131643A (zh) * 2024-05-08 2024-06-04 四川职业技术学院 一种基于物联网的智能家居系统、方法
CN118247418A (zh) * 2024-05-28 2024-06-25 长春师范大学 一种利用少量模糊图像重建神经辐射场的方法
CN118552705A (zh) * 2024-07-29 2024-08-27 杭州倚澜科技有限公司 一种商品渲染展示方法及系统

Also Published As

Publication number Publication date
CN114119839B (zh) 2022-07-01
CN114119839A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2023138477A1 (zh) 三维模型重建与图像生成方法、设备以及存储介质
WO2023138471A1 (zh) 三维场景渲染方法、设备以及存储介质
Li et al. Neural 3d video synthesis from multi-view video
CN114119838A (zh) 体素模型与图像生成方法、设备及存储介质
US10529086B2 (en) Three-dimensional (3D) reconstructions of dynamic scenes using a reconfigurable hybrid imaging system
CN109887003B (zh) 一种用于进行三维跟踪初始化的方法与设备
US20190213773A1 (en) 4d hologram: real-time remote avatar creation and animation control
JP2016537901A (ja) ライトフィールド処理方法
US10484599B2 (en) Simulating depth of field
CN113220251B (zh) 物体显示方法、装置、电子设备及存储介质
US10354399B2 (en) Multi-view back-projection to a light-field
WO2023077976A1 (zh) 一种图像处理方法、模型训练方法、相关装置及程序产品
Du et al. Video fields: fusing multiple surveillance videos into a dynamic virtual environment
CN112270736A (zh) 增强现实处理方法及装置、存储介质和电子设备
CN115239857B (zh) 图像生成方法以及电子设备
CN113628322A (zh) 图像处理、ar显示与直播方法、设备及存储介质
US20240112394A1 (en) AI Methods for Transforming a Text Prompt into an Immersive Volumetric Photo or Video
CN118298127B (zh) 三维模型重建与图像生成方法、设备、存储介质及程序产品
JP2024501958A (ja) ピクセル整列ボリュメトリックアバター
CN107871338A (zh) 基于场景装饰的实时交互渲染方法
Liu et al. Neural impostor: Editing neural radiance fields with explicit shape manipulation
CN116664770A (zh) 拍摄实体的图像处理方法、存储介质及系统
CN115497029A (zh) 视频处理方法、装置及计算机可读存储介质
CN115049559A (zh) 模型训练、人脸图像处理、人脸模型处理方法及装置、电子设备及可读存储介质
Amamra et al. Crime scene reconstruction with RGB-D sensors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23742792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE