CN116824026A - Three-dimensional reconstruction method, device, system and storage medium - Google Patents

Three-dimensional reconstruction method, device, system and storage medium Download PDF

Info

Publication number
CN116824026A
CN116824026A CN202311084904.2A CN202311084904A CN116824026A CN 116824026 A CN116824026 A CN 116824026A CN 202311084904 A CN202311084904 A CN 202311084904A CN 116824026 A CN116824026 A CN 116824026A
Authority
CN
China
Prior art keywords
image
image sequence
matrix
model
dimensional reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311084904.2A
Other languages
Chinese (zh)
Other versions
CN116824026B (en
Inventor
肖美华
李承欢
谭睿霄
徐锐涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202311084904.2A priority Critical patent/CN116824026B/en
Publication of CN116824026A publication Critical patent/CN116824026A/en
Application granted granted Critical
Publication of CN116824026B publication Critical patent/CN116824026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The application provides a three-dimensional reconstruction method, a device, a system and a storage medium, which belong to the field of image processing, wherein the method comprises the following steps: importing an original video, and dividing the original video to obtain an image sequence; analyzing the image sequence to obtain a model view perspective matrix and a mask image; constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set; and rendering the model view perspective matrix and the mask image through the initial reconstruction model to obtain a rendered image. The application can only monitor the model through the 2D image information, and the generated 3D model has considerable point-to-surface structure and point-to-surface quantity, has important significance in the fields of games, virtual reality industry and digital cultural relics, and reduces the manpower, financial resources and time cost brought by manual modeling.

Description

Three-dimensional reconstruction method, device, system and storage medium
Technical Field
The application mainly relates to the technical field of image processing, in particular to a three-dimensional reconstruction method, a device, a system and a storage medium.
Background
Three-dimensional reconstruction is a process of recovering a three-dimensional scene or object from two-dimensional images of multiple perspectives, which can save a lot of cost compared with manually modeling a 3D scene. Based on the characterization of the model, three-dimensional reconstruction techniques can be divided into implicit and explicit reconstructions. The common implicit reconstruction uses voxels, a signal distance function SDF (Signal Distance Function) and an occupied field OP (Occupational filed) to represent the shape, however, the implicit reconstruction method always adopts a Maring_cube algorithm when the model is synthesized finally, so that the number of points of the model exceeds the bearing capacity of the traditional modeling software, the generated model can be used only by using special checking software, an OBJ file is difficult to derive, and the implicit reconstruction often depends on 3D supervision information, which leads to the application non-generalizability. Conventional explicit reconstruction typically requires a large amount of input data, particularly high resolution sensor data, which can be a significant challenge for complex scenes or large-scale objects, require a large amount of time and computational resources, and are expensive sensor equipment, making explicit reconstruction applications difficult to popularize.
Disclosure of Invention
The application aims to solve the technical problem of providing a three-dimensional reconstruction method, a device, a system and a storage medium aiming at the defects of the prior art.
The technical scheme for solving the technical problems is as follows: a three-dimensional reconstruction method comprising the steps of:
importing an original video, and dividing the original video to obtain a plurality of image sequences;
analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence;
constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set;
rendering the model view perspective matrixes and the mask images corresponding to the image sequences respectively through the initial reconstruction model to obtain rendered images corresponding to the image sequences;
optimizing the initial reconstruction model according to all the image sequences and all the rendering images to obtain a three-dimensional reconstruction model;
importing an image to be reconstructed, and performing three-dimensional reconstruction on the image to be reconstructed through the three-dimensional reconstruction model to obtain a three-dimensional reconstruction result;
the process of analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence comprises the following steps:
extracting affine transformation matrix corresponding to each image sequence, image height corresponding to each image sequence, image width corresponding to each image sequence and camera focal length corresponding to each image sequence from each image sequence by utilizing a motion structure algorithm;
extracting mask images corresponding to the image sequences from the image sequences by using a Python tool;
and respectively carrying out matrix calculation on an affine transformation matrix corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence to obtain a model view perspective matrix corresponding to each image sequence.
The other technical scheme for solving the technical problems is as follows: a three-dimensional reconstruction apparatus comprising:
the importing module is used for importing the original video;
the segmentation module is used for segmenting the original video to obtain a plurality of image sequences;
the analysis module is used for respectively analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence;
the construction module is used for constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set;
the rendering module is used for respectively rendering each model view perspective matrix and mask images corresponding to each image sequence through the initial reconstruction model to obtain rendered images corresponding to each image sequence;
the optimization module is used for optimizing the initial reconstruction model according to all the image sequences and all the rendering images to obtain a three-dimensional reconstruction model;
the importing module is also used for importing an image to be reconstructed;
the three-dimensional reconstruction result obtaining module is used for carrying out three-dimensional reconstruction on the image to be reconstructed through the three-dimensional reconstruction model to obtain a three-dimensional reconstruction result;
the analysis module is used for:
extracting affine transformation matrix corresponding to each image sequence, image height corresponding to each image sequence, image width corresponding to each image sequence and camera focal length corresponding to each image sequence from each image sequence by utilizing a motion structure algorithm;
extracting mask images corresponding to the image sequences from the image sequences by using a Python tool;
and respectively carrying out matrix calculation on an affine transformation matrix corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence to obtain a model view perspective matrix corresponding to each image sequence.
Based on the three-dimensional reconstruction method, the application further provides a three-dimensional reconstruction system.
The other technical scheme for solving the technical problems is as follows: a three-dimensional reconstruction system comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which when executed by the processor implements a three-dimensional reconstruction method as described above.
Based on the three-dimensional reconstruction method, the application further provides a computer readable storage medium.
The other technical scheme for solving the technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a three-dimensional reconstruction method as described above.
The beneficial effects of the application are as follows: the method comprises the steps of obtaining a plurality of image sequences through segmentation of an original video, obtaining a model view perspective matrix and a mask image through analysis of the image sequences, constructing an initial reconstruction model through an original matrix structure, obtaining a rendered image through rendering of the model view perspective matrix and the mask image through the initial reconstruction model, obtaining a three-dimensional reconstruction model according to the image sequences and optimization of the initial reconstruction model by the rendered image, obtaining a three-dimensional reconstruction result through three-dimensional reconstruction of the image to be reconstructed through the three-dimensional reconstruction model, and generating a 3D model with a considerable point-to-plane structure and point-to-plane number only through a 2D image information supervision model.
Drawings
FIG. 1 is a schematic flow chart of a three-dimensional reconstruction method according to an embodiment of the present application;
FIG. 2 is a diagram of one of tetrahedral structures of a three-dimensional reconstruction method according to an embodiment of the present application;
FIG. 3 is a diagram showing a second tetrahedral structure of a three-dimensional reconstruction method according to an embodiment of the present application;
FIG. 4 is a diagram of a third tetrahedral structure of a three-dimensional reconstruction method according to an embodiment of the present application;
FIG. 5 is a diagram showing a tetrahedral structure of a three-dimensional reconstruction method according to an embodiment of the present application;
fig. 6 is a block diagram of a three-dimensional reconstruction device according to an embodiment of the present application.
Detailed Description
The principles and features of the present application are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the application and are not to be construed as limiting the scope of the application.
Fig. 1 is a schematic flow chart of a three-dimensional reconstruction method according to an embodiment of the present application.
As shown in fig. 1, a three-dimensional reconstruction method includes the following steps:
importing an original video, and dividing the original video to obtain a plurality of image sequences;
analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence;
constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set;
rendering the model view perspective matrixes and the mask images corresponding to the image sequences respectively through the initial reconstruction model to obtain rendered images corresponding to the image sequences;
optimizing the initial reconstruction model according to all the image sequences and all the rendering images to obtain a three-dimensional reconstruction model;
importing an image to be reconstructed, and performing three-dimensional reconstruction on the image to be reconstructed through the three-dimensional reconstruction model to obtain a three-dimensional reconstruction result;
the process of analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence comprises the following steps:
extracting affine transformation matrix corresponding to each image sequence, image height corresponding to each image sequence, image width corresponding to each image sequence and camera focal length corresponding to each image sequence from each image sequence by utilizing a motion structure algorithm;
extracting mask images corresponding to the image sequences from the image sequences by using a Python tool;
and respectively carrying out matrix calculation on an affine transformation matrix corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence to obtain a model view perspective matrix corresponding to each image sequence.
It should be appreciated that the original video may be a 360 degree video of the reconstructed product, with the obtained video being segmented into a sequence of images.
It will be appreciated that video is segmented into image sequences according to the frame parameters per second.
It will be appreciated that based on the resulting initial model, MVP (Model View Perspective) matrix (i.e., model view perspective matrix), the sequence of mask images (i.e., mask images), an image of the initial model is rendered.
It should be appreciated that the SFM algorithm (i.e., motion structure algorithm) is used on the acquired image sequence to acquire an affine transformation matrix of the camera coordinate system to the world coordinate system and the image height (i.e., image height), width (i.e., image width), camera focal length.
Specifically, the steps of the motion structure algorithm are specifically as follows:
for the resulting image sequenceFinding the correspondence between the overlapping images of the product and verifying the projection of the same points in the overlapping images, the output being a set of geometrically verified image pairs +.>And an image projection map for each point. For the photographed product we have the following assumptions: the product conforms to the rigid body motion characteristics, namely +.>,/>The world coordinate vector for any two points on the surface of the product satisfies the following formula: />It is understood that the product is not a fluid, smoke, or the like, surface-deformable object. For->At any point on the product +.>Its world coordinate is +.>The camera coordinates are +.>, and />Then->At the time, camera coordinates->The rigid body displacement between the camera coordinates and the world coordinates is called the external reference of the camera, for the known camera coordinates +.>Its image coordinates (two-dimensional), wherein />For depth scaling factor, ++>Is an internal reference matrix of a camera>For the camera focal length size, +.>For scaling factors,/>Is the center offset. The above six variables satisfy->Is the size of a horizontal pixel in [ meters per pixel ]]Similarly->Is the size of the vertical pixel +.>Is the horizontal focal length and is similarly->Is vertical focal length size +.>Is the aspect ratio. />For the projection matrix of camera coordinates onto image coordinates, based on the above assumptions and definitions, for each image +.>SFM (struct from motion) detection at position->Is a local feature set of (1)The set is denoted->We assume that the feature set remains unchanged under illumination and geometric transformation, that the SFM can identify the feature in multiple images, and that next the SFM uses the feature set +.>Finding the same scene part as an appearance description of an imageIs output as a set of potentially overlapping image pairsAnd corresponding characteristic relation->Finally, the camera internal and external parameters are estimated through PNP (Perspotive-N-Point) problem, and because the 2D-3D corresponding relation is often polluted by abnormal values, the pose of the calibration camera is estimated by using random sampling agreement (Random Sample Consensus, RANSAC) and a minimum pose solver and is stored in a npy file which can be quickly read and written, and the file content is stored in an array form and comprises N rows and 17 columns of data. Where N is the number of images and 17 columns store the in-camera parameters for the images. The first 12 columns define a +.>The last 3 columns define the image height, width, focal length of the camera-to-world affine transformation matrix. The last 2 columns define 2 depth values, a near boundary value and a far boundary value for scaling the range of products.
It will be appreciated that mask images corresponding to each of the image sequences are extracted from each of the image sequences, respectively, using a segmentation mask library function of a PHOTOSHOP or Python tool.
In the above embodiment, a plurality of image sequences are obtained by segmenting an original video, a model view perspective matrix and a mask image are obtained by analyzing the image sequences, an initial reconstruction model is constructed by an original matrix structure, a rendering image is obtained by rendering the model view perspective matrix and the mask image by the initial reconstruction model, a three-dimensional reconstruction model is obtained by optimizing the initial reconstruction model according to the image sequences and the rendering image, a three-dimensional reconstruction result is obtained by three-dimensional reconstruction of the image to be reconstructed by the three-dimensional reconstruction model, and the generated 3D model has a considerable point-to-plane structure and point-to-plane number and has important significance in the fields of games, virtual reality industry and digital relics, so that manpower, financial resources and time cost brought by manual modeling are reduced.
Alternatively, as one embodiment of the present application, the affine transformation matrix includes a sum of an x-axis origin and a center offset, a sum of a y-axis origin and a center offset, an x-axis center offset, and a y-axis center offset,
the process of respectively performing matrix calculation on the affine transformation matrix corresponding to each image sequence, the image height corresponding to each image sequence, the image width corresponding to each image sequence and the camera focal length corresponding to each image sequence to obtain the model view perspective matrix corresponding to each image sequence comprises the following steps:
performing matrix calculation on a sum of an x-axis origin and a center offset corresponding to each image sequence, a sum of a y-axis origin and a center offset corresponding to each image sequence, an x-axis center offset corresponding to each image sequence, a y-axis center offset corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence respectively by a first formula to obtain a model view perspective matrix corresponding to each image sequence, wherein the first formula is as follows:
wherein ,,
,
wherein ,,/>,
wherein ,for model view perspective matrix, < >>For perspective matrix->For the matrix of model viewing angles>For the vertical viewing angle range of the camera, < > for>For aspect ratio->For a preset far boundary value, < >>For a preset near boundary value,/>For the focal length of the camera,、/> and />Are all scaling factors, ++>Is the sum of the x-axis origin and the center offset, < >>For x-axis center offset, +.>Is the sum of the origin of the y-axis and the center offset, < >>For the y-axis center offset, +.>For image height +.>Is the image width.
It should be appreciated that the camera focal length calculates MVP (Model View Perspective) matrix (i.e., model view perspective matrix) using the affine transformation matrix of the camera coordinate system to the world coordinate system and the image height (i.e., image height), width (i.e., image width).
Specifically, the following formula:
for the camera focal length size, +.>For scaling factor +.>Is the sum of the origin and the center offset. The above six variables satisfy->Is the size of a horizontal pixel in [ meters per pixel ]]Similarly->Is the size of the vertical pixel +.>Is the horizontal focal length and is similarly->Is vertical focal length size +.>Is of length-width ratio, h isThe image is high and w is the image width.
Specifically, the MVP (Model View Perspective) matrix is multiplied by three matrices of Model, view, perselect, which are considered as an overall 4x4 matrix.
Wherein the Perspective matrix is equal to the following formula:
: the far-end boundary value is used to determine,
: the value of the near-boundary value is,
wherein MV (Model View) matrix is of the formula:
of (2), wherein->,/>Is->,/>The value after the center offset is removed.
In the above embodiment, the affine transformation matrix, the image height, the image width and the camera focal length are respectively calculated to obtain the model view perspective matrix, and the generated 3D model has considerable point-to-surface structure and point-to-surface quantity only through the 2D image information supervision model, so that the method has important significance in the fields of games, virtual reality industry and digital cultural relics, and reduces the manpower, financial resources and time cost brought by manual modeling.
Alternatively, as an embodiment of the present application, as shown in fig. 1 to 5, the primitive matrix group includes a tetrahedral vertex three-dimensional coordinate matrix and a vertex index matrix,
the process of constructing an original matrix set and constructing an initial reconstruction model through the original matrix set comprises the following steps:
s31: counting the number of three-dimensional coordinates in the tetrahedron vertex three-dimensional coordinate matrix to obtain the total number of the tetrahedron vertex three-dimensional coordinates;
s32: carrying out random assignment on the total number of the three-dimensional coordinates of the tetrahedron vertexes to obtain a plurality of SDF values, and constructing an SDF value matrix through all the SDF values;
s33: and constructing a model of the tetrahedron vertex three-dimensional coordinate matrix, the vertex index matrix and the SDF value matrix by using a marching tetrahedron algorithm to obtain an initial reconstruction model.
It should be understood that the geometric training model is built by initialization, specifically: loading and storing a tetrahedron vertex three-dimensional coordinate matrix, creating an SDF value matrix for the tetrahedron vertex three-dimensional coordinate matrix by using a compressed file of a single tetrahedron vertex index matrix, creating a displacement matrix, and registering as a training parameter.
It should be understood that, as shown in figure 2,four vertices>Is a label for the tetrahedron.
It should be understood that, as shown in fig. 3 to 5, the tetrahedrons are not generated, and are generated triangles according to the shape conditions generated by the SDF values of the vertices after scaling, symmetry and rotation, respectively, when the SDF values of the vertices are positive numbers, the vertices are indicated to be outside the generated model, otherwise, are internal, and when the SDF values are 0, the vertices are exactly on the surface of the model.
Specifically, as shown in fig. 3 to 5, the travelling tetrahedral algorithm is: the coded SDF is converted into an explicit triangular mesh using the Marching Tetrahedra (MT for short) algorithm. SDF value for a given tetrahedron vertexMT is according to->The sign of (2) determines the surface type inside the tetrahedron, the total number of configurations being 24=16, which can be divided into 3 special cases in view of the rotational symmetry. Once the surface type inside the tetrahedron is determined, the vertex position of the iso-surface is calculated at zero-crossing point of the linear interpolation along the tetrahedron edge, +.>Four vertices of a tetrahedron, respectively, the diamonds representing newly generated vertex diamond representations, such as: />Is vertex->The new vertex is generated, the cross rectangle represents the vertex with the SDF as a negative value, the circle represents the vertex with the SDF as a positive value, and the generated new vertex is represented by the formula:given.
It should be understood that three unique surface configurations in MT, the vertex color represents the sign of the sign distance value. Note that flipping the sign of all vertices will result in the same surface configuration, with the position of the vertices linearly interpolated along the edges where the sign changes.
It should be understood that the geometry and material are jointly trained. Detailed: for a tetrahedral vertex three-dimensional coordinate matrix, an SDF value matrix, a vertex index matrix uses a marching tetrahedral algorithm to obtain an initial model (i.e., an initial reconstruction model).
In the above embodiment, the original matrix set is constructed, and the initial reconstruction model is constructed through the original matrix set, so that the generated 3D model has a considerable point-to-surface structure and number of points and faces only through the 2D image information supervision model, thereby reducing the labor, financial and time costs caused by manual modeling.
Optionally, as an embodiment of the present application, the process of optimizing the initial reconstruction model according to all the image sequences and all the rendered images to obtain a three-dimensional reconstruction model includes:
performing loss function calculation on all the image sequences and all the rendered images to obtain a target loss function;
and updating parameters of the tetrahedron vertex three-dimensional coordinate matrix and the SDF value matrix according to the target loss function, returning to S33 after updating until the preset iteration times are reached, and taking the initial reconstruction model as a three-dimensional reconstruction model.
Preferably, the preset number of iterations may be 100.
It will be appreciated that the resulting image of the initial model (i.e., the rendered image) is subjected to a mean square error loss function with the original captured image (i.e., the image sequence) and the gradients are counter-propagated to update the SDF value matrix, the displacement matrix.
In the above embodiment, the initial reconstruction model is optimized according to all the image sequences and all the rendering images to obtain the three-dimensional reconstruction model, and the generated 3D model has considerable point-surface structure and point-surface number only through the 2D image information supervision model, so that the manpower, financial resources and time cost brought by manual modeling are reduced.
Optionally, as an embodiment of the present application, the process of performing a loss function calculation on all the image sequences and all the rendered images to obtain a target loss function includes:
performing loss function calculation on all the image sequences and all the rendered images through a second formula to obtain a target loss function, wherein the second formula is as follows:
wherein ,for the objective loss function->For image sequences +.>For rendering an image +.>For the total number of image sequences.
It should be understood that the loss function is as follows:
wherein For the number of image sequences +.>For the original image (i.e. image sequence), -a program for the method is provided>Is a rendered image (i.e., a rendered image).
In the above embodiment, the objective loss function is obtained by performing the loss function calculation on all the image sequences and all the rendering images by the second formula, and the generated 3D model has a considerable point-to-surface structure and point-to-surface number only by using the 2D image information supervision model, which has important significance in the fields of games, virtual reality industry and digital cultural relics, and reduces the manpower, financial resources and time cost brought by manual modeling.
Optionally, as another embodiment of the present application, an explicit reconstruction method is adopted, but an implicit reconstruction SDF method is combined, a deformable tetrahedron is used to predict an SDF defined on a deformable tetrahedron grid, then the SDF is converted into a surface grid through a travelling tetrahedron, and only a 2D image information supervision model is used, so that the generated 3D model has a considerable point-to-plane structure and point-to-plane number, which is significant in the fields of games, virtual reality industry and digital cultural relics, and reduces the time cost of manpower and financial resources brought by manual modeling.
Optionally, as another embodiment of the present application, the present application further includes initializing a texture training model. The method comprises the following steps: creating a color map, a highlight map and a normal map three-channel value, limiting the color map and the highlight map three-channel value to 0-1, limiting the normal map three-channel value to-1, and storing the MLP position code as a material dictionary.
Fig. 6 is a block diagram of a three-dimensional reconstruction device according to an embodiment of the present application.
Alternatively, as another embodiment of the present application, as shown in fig. 6, a three-dimensional reconstruction apparatus includes:
the importing module is used for importing the original video;
the segmentation module is used for segmenting the original video to obtain a plurality of image sequences;
the analysis module is used for respectively analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence;
the construction module is used for constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set;
the rendering module is used for respectively rendering each model view perspective matrix and mask images corresponding to each image sequence through the initial reconstruction model to obtain rendered images corresponding to each image sequence;
the optimization module is used for optimizing the initial reconstruction model according to all the image sequences and all the rendering images to obtain a three-dimensional reconstruction model;
the importing module is also used for importing an image to be reconstructed;
the three-dimensional reconstruction result obtaining module is used for carrying out three-dimensional reconstruction on the image to be reconstructed through the three-dimensional reconstruction model to obtain a three-dimensional reconstruction result;
the analysis module is used for:
extracting affine transformation matrix corresponding to each image sequence, image height corresponding to each image sequence, image width corresponding to each image sequence and camera focal length corresponding to each image sequence from each image sequence by utilizing a motion structure algorithm;
extracting mask images corresponding to the image sequences from the image sequences by using a Python tool;
and respectively carrying out matrix calculation on an affine transformation matrix corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence to obtain a model view perspective matrix corresponding to each image sequence.
Alternatively, another embodiment of the present application provides a three-dimensional reconstruction system including a memory, a processor, and a computer program stored in the memory and executable on the processor, which when executed by the processor, implements the three-dimensional reconstruction method as described above. The system may be a computer or the like.
Alternatively, another embodiment of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the three-dimensional reconstruction method as described above.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims (8)

1. The three-dimensional reconstruction method is characterized by comprising the following steps of:
importing an original video, and dividing the original video to obtain a plurality of image sequences;
analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence;
constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set;
rendering the model view perspective matrixes and the mask images corresponding to the image sequences respectively through the initial reconstruction model to obtain rendered images corresponding to the image sequences;
optimizing the initial reconstruction model according to all the image sequences and all the rendering images to obtain a three-dimensional reconstruction model;
importing an image to be reconstructed, and performing three-dimensional reconstruction on the image to be reconstructed through the three-dimensional reconstruction model to obtain a three-dimensional reconstruction result;
the process of analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence comprises the following steps:
extracting affine transformation matrix corresponding to each image sequence, image height corresponding to each image sequence, image width corresponding to each image sequence and camera focal length corresponding to each image sequence from each image sequence by utilizing a motion structure algorithm;
extracting mask images corresponding to the image sequences from the image sequences by using a Python tool;
and respectively carrying out matrix calculation on an affine transformation matrix corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence to obtain a model view perspective matrix corresponding to each image sequence.
2. The three-dimensional reconstruction method according to claim 1, wherein the affine transformation matrix comprises a sum of an x-axis origin and a center offset, a sum of a y-axis origin and a center offset, an x-axis center offset, and a y-axis center offset,
the process of respectively performing matrix calculation on the affine transformation matrix corresponding to each image sequence, the image height corresponding to each image sequence, the image width corresponding to each image sequence and the camera focal length corresponding to each image sequence to obtain the model view perspective matrix corresponding to each image sequence comprises the following steps:
performing matrix calculation on a sum of an x-axis origin and a center offset corresponding to each image sequence, a sum of a y-axis origin and a center offset corresponding to each image sequence, an x-axis center offset corresponding to each image sequence, a y-axis center offset corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence respectively by a first formula to obtain a model view perspective matrix corresponding to each image sequence, wherein the first formula is as follows:
wherein ,,
,
wherein ,,/>,
wherein ,for model view perspective matrix, < >>For perspective matrix->For the matrix of model viewing angles>For the vertical viewing angle range of the camera, < > for>For aspect ratio->For a preset far boundary value, < >>For a preset near boundary value,/>For the focal length of the camera +.> and />Are all scaling factors, ++>Is the sum of the x-axis origin and the center offset, < >>For x-axis center offset, +.>Is the sum of the origin of the y-axis and the center offset, < >>For the y-axis center offset, +.>For image height +.>Is the image width.
3. The three-dimensional reconstruction method according to claim 1, wherein the original matrix group comprises a tetrahedral vertex three-dimensional coordinate matrix and a vertex index matrix,
the process of constructing an original matrix set and constructing an initial reconstruction model through the original matrix set comprises the following steps:
s31: counting the number of three-dimensional coordinates in the tetrahedron vertex three-dimensional coordinate matrix to obtain the total number of the tetrahedron vertex three-dimensional coordinates;
s32: carrying out random assignment on the total number of the three-dimensional coordinates of the tetrahedron vertexes to obtain a plurality of SDF values, and constructing an SDF value matrix through all the SDF values;
s33: and constructing a model of the tetrahedron vertex three-dimensional coordinate matrix, the vertex index matrix and the SDF value matrix by using a marching tetrahedron algorithm to obtain an initial reconstruction model.
4. The method of claim 3, wherein optimizing the initial reconstruction model based on all of the image sequences and all of the rendered images to obtain a three-dimensional reconstruction model comprises:
performing loss function calculation on all the image sequences and all the rendered images to obtain a target loss function;
and updating parameters of the tetrahedron vertex three-dimensional coordinate matrix and the SDF value matrix according to the target loss function, returning to S33 after updating until the preset iteration times are reached, and taking the initial reconstruction model as a three-dimensional reconstruction model.
5. The three-dimensional reconstruction method according to claim 4, wherein the step of performing a loss function calculation on all the image sequences and all the rendered images to obtain an objective loss function comprises:
performing loss function calculation on all the image sequences and all the rendered images through a second formula to obtain a target loss function, wherein the second formula is as follows:
wherein ,for the objective loss function->For image sequences +.>For rendering an image +.>For the total number of image sequences.
6. A three-dimensional reconstruction apparatus, comprising:
the importing module is used for importing the original video;
the segmentation module is used for segmenting the original video to obtain a plurality of image sequences;
the analysis module is used for respectively analyzing each image sequence to obtain a model view perspective matrix corresponding to each image sequence and a mask image corresponding to each image sequence;
the construction module is used for constructing an original matrix set, and constructing an initial reconstruction model through the original matrix set;
the rendering module is used for respectively rendering each model view perspective matrix and mask images corresponding to each image sequence through the initial reconstruction model to obtain rendered images corresponding to each image sequence;
the optimization module is used for optimizing the initial reconstruction model according to all the image sequences and all the rendering images to obtain a three-dimensional reconstruction model;
the importing module is also used for importing an image to be reconstructed;
the three-dimensional reconstruction result obtaining module is used for carrying out three-dimensional reconstruction on the image to be reconstructed through the three-dimensional reconstruction model to obtain a three-dimensional reconstruction result;
the analysis module is used for:
extracting affine transformation matrix corresponding to each image sequence, image height corresponding to each image sequence, image width corresponding to each image sequence and camera focal length corresponding to each image sequence from each image sequence by utilizing a motion structure algorithm;
extracting mask images corresponding to the image sequences from the image sequences by using a Python tool;
and respectively carrying out matrix calculation on an affine transformation matrix corresponding to each image sequence, an image height corresponding to each image sequence, an image width corresponding to each image sequence and a camera focal length corresponding to each image sequence to obtain a model view perspective matrix corresponding to each image sequence.
7. A three-dimensional reconstruction system comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the three-dimensional reconstruction method according to any one of claims 1 to 5 is implemented when the computer program is executed by the processor.
8. A computer readable storage medium storing a computer program, characterized in that the three-dimensional reconstruction method according to any one of claims 1 to 5 is implemented when the computer program is executed by a processor.
CN202311084904.2A 2023-08-28 2023-08-28 Three-dimensional reconstruction method, device, system and storage medium Active CN116824026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311084904.2A CN116824026B (en) 2023-08-28 2023-08-28 Three-dimensional reconstruction method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311084904.2A CN116824026B (en) 2023-08-28 2023-08-28 Three-dimensional reconstruction method, device, system and storage medium

Publications (2)

Publication Number Publication Date
CN116824026A true CN116824026A (en) 2023-09-29
CN116824026B CN116824026B (en) 2024-01-09

Family

ID=88120565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311084904.2A Active CN116824026B (en) 2023-08-28 2023-08-28 Three-dimensional reconstruction method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN116824026B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020249076A1 (en) * 2019-06-14 2020-12-17 华为技术有限公司 Face calibration method and electronic device
CN112784469A (en) * 2021-02-25 2021-05-11 广州虎牙科技有限公司 Model parameter generation method and device, electronic equipment and readable storage medium
CN113160296A (en) * 2021-03-31 2021-07-23 清华大学 Micro-rendering-based three-dimensional reconstruction method and device for vibration liquid drops
CN113256718A (en) * 2021-05-27 2021-08-13 浙江商汤科技开发有限公司 Positioning method and device, equipment and storage medium
CN114119849A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium
CN115115780A (en) * 2022-06-29 2022-09-27 聚好看科技股份有限公司 Three-dimensional reconstruction method and system based on multi-view RGBD camera
CN115439607A (en) * 2022-09-01 2022-12-06 中国民用航空总局第二研究所 Three-dimensional reconstruction method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020249076A1 (en) * 2019-06-14 2020-12-17 华为技术有限公司 Face calibration method and electronic device
CN112784469A (en) * 2021-02-25 2021-05-11 广州虎牙科技有限公司 Model parameter generation method and device, electronic equipment and readable storage medium
CN113160296A (en) * 2021-03-31 2021-07-23 清华大学 Micro-rendering-based three-dimensional reconstruction method and device for vibration liquid drops
CN113256718A (en) * 2021-05-27 2021-08-13 浙江商汤科技开发有限公司 Positioning method and device, equipment and storage medium
CN114119849A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium
CN115115780A (en) * 2022-06-29 2022-09-27 聚好看科技股份有限公司 Three-dimensional reconstruction method and system based on multi-view RGBD camera
CN115439607A (en) * 2022-09-01 2022-12-06 中国民用航空总局第二研究所 Three-dimensional reconstruction method and device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIQUAN YU; MEIHUA XIAO: "Review and Evaluation of Classification Algorithms Enhancing Internet Security", 《 2010 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING》 *
MEIHUA XIAO ET AL: "A formal analysis method for composition protocol based on model checking", 《 SCIENTIFIC REPORTS 》 *
缪永伟;冯小红;于莉洁;陈佳舟;李永水;: "基于单幅图像的三维建筑物交互累进式建模", 计算机辅助设计与图形学学报, no. 09 *
罗国亮;陈强;王睿;肖美华;杨辉: "一种样本相似度抑制的大尺度三维人脸合成系统", 《华东交通大学》 *

Also Published As

Publication number Publication date
CN116824026B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
Gortler et al. The lumigraph
CN110363858B (en) Three-dimensional face reconstruction method and system
CN109118582B (en) Commodity three-dimensional reconstruction system and reconstruction method
Digne et al. Scale space meshing of raw data point sets
Poulin et al. Interactively modeling with photogrammetry
CN111127633A (en) Three-dimensional reconstruction method, apparatus, and computer-readable medium
CN105453139A (en) Sparse GPU voxelization for 3D surface reconstruction
Long et al. Neuraludf: Learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies
Gibson et al. Interactive reconstruction of virtual environments from video sequences
CN108665530B (en) Three-dimensional modeling implementation method based on single picture
Ramanarayanan et al. Feature-based textures
Sarkar et al. Structured low-rank matrix factorization for point-cloud denoising
CN115439607A (en) Three-dimensional reconstruction method and device, electronic equipment and storage medium
JP2000268179A (en) Three-dimensional shape information obtaining method and device, two-dimensional picture obtaining method and device and record medium
Laycock et al. Aligning archive maps and extracting footprints for analysis of historic urban environments
Fua et al. Reconstructing surfaces from unstructured 3d points
CN110706332B (en) Scene reconstruction method based on noise point cloud
CN110335275B (en) Fluid surface space-time vectorization method based on three-variable double harmonic and B spline
CN116824026B (en) Three-dimensional reconstruction method, device, system and storage medium
Mi et al. 3D reconstruction based on the depth image: A review
Bullinger et al. 3D Surface Reconstruction from Multi-Date Satellite Images
Zach et al. Accurate Dense Stereo Reconstruction using Graphics Hardware.
Nie et al. Physics-preserving fluid reconstruction from monocular video coupling with SFS and SPH
CN114049423A (en) Automatic realistic three-dimensional model texture mapping method
Nguyen et al. Modelling of 3d objects using unconstrained and uncalibrated images taken with a handheld camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant