WO2023019478A1 - Procédé et appareil de reconstruction tridimensionnelle, dispositif électronique et support de stockage lisible - Google Patents

Procédé et appareil de reconstruction tridimensionnelle, dispositif électronique et support de stockage lisible Download PDF

Info

Publication number
WO2023019478A1
WO2023019478A1 PCT/CN2021/113308 CN2021113308W WO2023019478A1 WO 2023019478 A1 WO2023019478 A1 WO 2023019478A1 CN 2021113308 W CN2021113308 W CN 2021113308W WO 2023019478 A1 WO2023019478 A1 WO 2023019478A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstructed
functional module
graph
output
human body
Prior art date
Application number
PCT/CN2021/113308
Other languages
English (en)
Chinese (zh)
Inventor
王磊
刘薰裕
马晓亮
刘宝玉
程俊
Original Assignee
深圳先进技术研究院
中国科学院深圳理工大学(筹)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院, 中国科学院深圳理工大学(筹) filed Critical 深圳先进技术研究院
Priority to PCT/CN2021/113308 priority Critical patent/WO2023019478A1/fr
Publication of WO2023019478A1 publication Critical patent/WO2023019478A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present application belongs to the technical field of image processing, and in particular relates to a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment, and a computer-readable storage medium.
  • the 3D reconstruction of human body parts has always been a hot topic in computational vision, and has been widely used in the fields of virtual reality (VR) and augmented reality (AR).
  • VR virtual reality
  • AR augmented reality
  • 3D reconstruction techniques need to rely on more complex and expensive equipment, such as 3D scanners, multi-view cameras or inertial sensors.
  • 3D reconstruction techniques based on a single image have been developed, these 3D reconstruction techniques still have problems such as unstable reconstruction effects.
  • the present application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, an electronic device and a computer-readable storage medium, which can solve the problem of unstable reconstruction effect existing in the existing three-dimensional reconstruction technology.
  • the present application provides a three-dimensional reconstruction method, including:
  • the present application provides a three-dimensional reconstruction device, including:
  • the extraction module is used to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the above-mentioned feature vector is used to represent the shape feature information of the above-mentioned object to be reconstructed;
  • a generation module configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
  • the reconstruction module is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
  • the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, the above-mentioned first aspect is implemented. steps of the method.
  • the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program, and when the above-mentioned computer program is executed by one or more processors, the steps of the method in the above-mentioned first aspect are implemented.
  • the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the three-dimensional reconstruction method proposed in the first aspect.
  • the beneficial effect of the present application compared with the prior art is: for the image of the object to be reconstructed, the present application first performs feature extraction on the image to obtain the feature vector used to characterize the shape feature information of the above-mentioned object to be reconstructed, and then The feature vector will be combined with the preset template for the object to be reconstructed to generate a feature map, and finally the feature map will be input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed .
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • Fig. 1 is a schematic diagram of the implementation flow of the three-dimensional reconstruction method provided by the embodiment of the present application;
  • FIG. 2 is an example diagram of a preset template when the object to be reconstructed is a human body provided by the embodiment of the present application;
  • Fig. 3 is a schematic structural diagram of the first functional module of the graph convolutional neural network provided by the embodiment of the present application;
  • Fig. 4 is a schematic structural diagram of the i-th functional module of the graph convolutional neural network provided by the embodiment of the present application;
  • FIG. 5 is a schematic structural diagram of the Nth functional module of the graph convolutional neural network provided by the embodiment of the present application.
  • Fig. 6 is an example diagram of the overall structure of the graph convolutional neural network provided by the embodiment of the present application.
  • FIG. 7 is an example diagram of the working framework of the three-dimensional reconstruction method provided by the embodiment of the present application.
  • FIG. 8 is a structural block diagram of a three-dimensional reconstruction device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application proposes a 3D reconstruction method, a 3D reconstruction device, an electronic device, and a computer-readable storage medium. After extracting the feature vector of the object to be reconstructed in the image, the feature vector can be combined with The preset template representing the three-dimensional structure information of the object to be reconstructed is combined to obtain a feature map, so that the feature map contains the three-dimensional structure information of the object to be reconstructed displayed by the preset template on the basis of the shape feature of the object to be reconstructed, It can be better processed by the trained graph convolutional neural network, and more accurate 3D reconstruction results can be obtained, ensuring the stability of 3D reconstruction.
  • specific examples will be used below to illustrate.
  • Step 101 performing feature extraction on an image of an object to be reconstructed to obtain a feature vector.
  • the electronic device can take pictures of the object to be reconstructed through its own camera, so that the electronic device can obtain the image of the object to be reconstructed; or, it can also be a third-party device equipped with a camera
  • the object is photographed, and after the image of the object to be reconstructed is obtained, the image is transmitted to the electronic device in a wireless or wired manner, so that the electronic device obtains the image of the object to be reconstructed, and the image of the object to be reconstructed is not acquired here way is limited.
  • the electronic device After the electronic device obtains the image of the object to be reconstructed, it can perform feature extraction on the image to obtain a feature vector.
  • the feature extraction operation here is mainly aimed at the shape feature of the object to be reconstructed information; that is, the obtained feature vector is actually used to characterize the shape feature information of the object to be reconstructed.
  • the shape feature information includes contour features describing the boundary shape of the object to be reconstructed and/or region features describing the internal shape of the object to be reconstructed, etc., which are not limited herein.
  • the electronic device can first perform preprocessing on the image, such as segmentation operations and size adjustment operations; then the step 101 may include:
  • the electronic device may first perform frame detection on the image based on the object to be reconstructed, that is, recognize the frame of the object to be reconstructed from the image.
  • the frame is usually a rectangular frame; of course, it can also be a frame of other preset shapes, which is not limited here.
  • the image area within this bounding box can be regarded as a region of interest. Segmenting the image to be processed based on the border can segment the region of interest from the image, thereby removing a large amount of redundant information and noise information contained in the background of the image to a certain extent, and obtaining the A partial image of the object, and the object to be reconstructed can be located at the center of the partial image as much as possible.
  • the electronic device may use the human body two-dimensional key point detection technology OpenPose to perform border detection on the image.
  • A2 Adjust the size of the partial image to a preset size.
  • the electronic device can unify the size of the partial image: if the size of the partial image is inconsistent with the preset size, the partial image is scaled until the size of the partial image is consistent with the preset size .
  • the preset size may be: 224 ⁇ 224, and the unit is pixel.
  • the number of channels of the partial image is usually 3, which is used to represent three channels of red, green and blue (Red Green Blue, RGB). That is, the final size of the partial image is: 224 ⁇ 224 ⁇ 3.
  • CNN convolutional Neural Networks
  • the electronic device can pre-train the convolutional neural network for classification tasks on a given data set in advance, and the pre-training process can refer to the current general training process for neural networks, which will not be repeated here.
  • the classification layer in the convolutional neural network is removed, and the feature extraction layer before the classification layer is retained, and the retained result constitutes the encoder.
  • the convolutional neural network may be ResNet50
  • the given data set may be the ImageNet data set
  • the output of the encoder of the convolutional neural network is a 2048-dimensional feature vector.
  • Step 102 generate a feature map according to the feature vector and the preset template for the object to be reconstructed.
  • the preset template is used to characterize the three-dimensional structure information of the object to be reconstructed, specifically the three-dimensional structure information of the object to be reconstructed in a specified pose.
  • different preset templates are set.
  • the preset template is a human body mesh
  • the preset template is a hand mesh.
  • the electronic device can use the Preset templates are expressed in a graph structure and combined with feature vectors to generate feature maps.
  • step 101 may include:
  • the human body mesh map adopted by the electronic device for the human body may be a standard template defined by the SMPL (Skinned Multi-Person Linear) model.
  • the human body mesh diagram represents the three-dimensional mesh of the human body under a T-pose (T-Pose).
  • the electronic device in order to reduce the computational complexity of subsequent graph convolution operations, can perform 4 times downsampling on the standard template defined by the SMPL model, and use the downsampling result as a preset template (that is, the human body grid map). It can be understood that the number of vertices of the human body mesh image obtained after 4 times downsampling is reduced to 1723. In the future, it is only necessary to perform 4 times upsampling on the 3D reconstruction results output by the graph convolutional neural network to obtain the final 3D reconstruction results and complete the reconstruction task.
  • M h (V h ,A h ),V h ⁇ R 1723 ⁇ 3 ,A h ⁇ 0,1 ⁇ 1723 ⁇ 1723
  • M h represents the vertex information matrix of the downsampled human body mesh
  • V h represents the vertex set of the downsampled human body mesh
  • V h ⁇ R 1723 ⁇ 3 represents the 1723 vertices contained in the vertex set
  • the coordinate values of the three-dimensional coordinates of are all real numbers
  • a h represents the adjacency matrix of the downsampled human body grid map, the meaning of the adjacency matrix has been explained above, and will not be repeated here
  • a h ⁇ ⁇ 0,1 ⁇ 1723 ⁇ 1723 indicates that the values of the elements in the adjacency matrix are 0 or 1.
  • the 2048-dimensional feature vector obtained in step 101 is f ⁇ R 2048
  • the vertex information matrix of the 4-fold downsampled human body mesh image obtained in step B1 is M h ⁇ R 1723 ⁇ 3
  • the two are fused
  • the feature map can be obtained, which is the input of the subsequent graph convolutional neural network, which can be expressed as F in ⁇ R 1723 ⁇ 2051 .
  • the above operation can be understood as: splicing the 2048-dimensional feature vector to each vertex.
  • the process of generating the feature map is similar to steps B1 and B2, except that the preset template has been changed.
  • the hand grid map adopted by the electronic device for the hand may be a standard template defined by the MANO (hand Model with Articulated and Non-rigid defOrmations) model. It should be noted that since the hand grid image contains a small number of vertices (usually more than 700), there is no need to perform down-sampling operations on the hand grid image.
  • the feature map actually combines the vertex position information of the preset template and the shape feature information of the part to be reconstructed represented in the image.
  • Step 103 input the feature map into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
  • the graph convolutional neural network will use the feature map as input, and finally output the transformed grid vertex position information of the object to be reconstructed as the 3D reconstruction result.
  • the following is an introduction to the overall structure of the graph convolutional neural network:
  • the graph convolutional neural network includes N functional modules in series; wherein, the input of the first functional module is the input of the graph convolutional neural network, and the output of the Nth functional module is the output of the graph convolutional neural network, and N is an integer greater than 2.
  • the first functional module is mainly used to receive the input of the graph convolutional neural network
  • the i-th functional module is mainly used for data calculation and transmission operations
  • the N-th functional module is mainly used to output the final predicted Grid vertex position information of the object to be reconstructed, where i is an integer greater than 1 and less than N.
  • each functional module includes the following three basic units, namely: convolution unit, normalization unit and activation function unit.
  • convolution unit namely: convolution unit, normalization unit and activation function unit.
  • activation function unit namely:
  • Fig. 3 shows the schematic structure of the first functional module.
  • the first functional module includes at least three specified structures (only three are shown in Figure 3), and the at least three specified structures are connected in series, and the specified structures include convolution units, normalization units, and A unit and an activation function unit; in the at least three specified structures: the input of the first specified structure is the input of the first functional module (that is, the input of the graph convolutional neural network); the output of the last specified structure The residual between the input of the first functional module (that is, the input of the graph convolutional neural network) is the output of the first functional module.
  • FIG. 4 shows a structural diagram of the i-th functional module.
  • the i-th functional module includes at least two specified structures (only 2 are shown in Figure 4), and the at least two specified structures are connected in sequence, and the specified structure is the same as the specified structure of the first functional module Same, no more details here.
  • the input of the first specified structure is the output of the i-1th functional module
  • the residual between the output of the last specified structure and the output of the i-1th functional module is the The output of the i-th functional module.
  • FIG. 5 shows a schematic structural diagram of the Nth functional module.
  • the Nth functional module includes a convolution unit, a normalization unit, an activation function unit, and a convolutional unit connected in series; where the input of the first convolutional unit is the N-1th functional module. output, the residual between the output of the second convolution unit and the output of the N-1th functional module is the output of the Nth functional module. It can be understood that since the function of the Nth functional module is to output the last predicted grid vertex position information of the object to be reconstructed, the data output by the second convolution unit of the Nth functional module is not Data normalization processing and activation function processing are then required.
  • Figure 6 shows an example of the overall structure of a graph convolutional neural network including 4 functional modules.
  • the parameter f in and the parameter f out can be used to represent the change in the size of the feature dimension of each functional module during the graph convolution operation.
  • the first functional module includes 3 convolution units
  • the change of its feature dimension size can be expressed as (f in , f out , f out2 , f out3 ), where f in is the feature dimension size of the initial input, f out1 , f out2 and f out3 are the feature dimension sizes output by the three convolution units respectively, since the normalization unit and the activation function unit do not change the feature dimension size, so input the second
  • the size of the feature dimension of the convolution unit is the same as the size of the feature dimension output by the first convolution unit, and so on.
  • the obtained feature map F in ⁇ R 1723 ⁇ 2051 when the object to be reconstructed is a human body, the obtained feature map F in ⁇ R 1723 ⁇ 2051 .
  • the feature map passes through the graph convolutional neural network, and the final output F out ⁇ R 1723 ⁇ 3 of the graph convolutional neural network can be obtained.
  • M out ⁇ R 6890 ⁇ 3 is obtained, that is, the final 3D reconstruction result for the human body is obtained.
  • the convolution unit may be a Chebyshev convolution unit, and the Chebyshev convolution unit specifically uses Chebyshev polynomials to construct a Chebyshev convolution algorithm.
  • the processing speed of the graph convolutional neural network can be accelerated to a certain extent, and the efficiency of 3D reconstruction can be improved.
  • F in ⁇ R N ⁇ fin represents the input features.
  • F out ⁇ R N ⁇ fout represents the output feature.
  • K represents the use of K-order Chebyshev polynomials.
  • ⁇ k ⁇ R fin ⁇ fout represents the feature change matrix, and the parameters inside are the values that the graph convolutional neural network needs to learn.
  • a scaled Laplacian matrix representing the preset template When the object to be reconstructed is a human body and the preset template used is the downsampling result of the standard template defined by the SMPL model, N is the number of vertices after downsampling, 1723.
  • the scaled Laplacian matrix is specifically:
  • I is an identity matrix
  • ⁇ max is the largest eigenvalue of the L p matrix.
  • L is an intermediate parameter, which has no actual physical meaning.
  • W is the parameter that the graph convolutional neural network needs to learn.
  • the graph convolutional neural network can be trained using two datasets, Human3.6M and MSCOCO. Specifically, since these two data sets do not store the real human body mesh in each training sample, but only store the position information of the real human body's 3D joints, therefore, it is necessary to pre-based on the position information of the real human body's 3D joints of each training sample
  • the high-precision real human body mesh is obtained by fitting, and the real human body mesh can be used as a strong label and put into the training process of the convolutional neural network of the image for use. That is to say, the real human body mesh mentioned here is actually a high-precision result fitted based on the three-dimensional joints of the real human body.
  • this graph convolutional neural network is basically the same as that of a general neural network, except that a new loss function is used to make the 3D reconstruction results output by the trained graph convolutional neural network model smoother. Complete, its practicality is also higher.
  • the loss function is:
  • ⁇ a , ⁇ b , ⁇ c and ⁇ d are hyperparameters.
  • Lv denotes the mesh loss, which is used to describe the positional difference between the real body mesh and the predicted body mesh.
  • M * represent the position of each vertex of the real human body mesh
  • M represent the position of each vertex of the predicted human body mesh
  • L1 loss the mesh loss L v is expressed as follows:
  • L j represents the 3D joint loss, which is used to describe the position difference between the real human 3D joints and the predicted human 3D joints.
  • J 3D* represent the position of the real three-dimensional joints of the human body
  • JM represents the predicted position of the human body joints
  • J ⁇ R v ⁇ N represents the matrix of joints extracted from the human body mesh
  • M represents the position of each vertex of the predicted human body mesh
  • L n represents the surface normal loss, which is used to describe the angle difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh.
  • f represent the triangular surface of the predicted human body mesh
  • n f * represents the unit normal vector of the triangular surface corresponding to f in the real human body mesh
  • m i and m j represent the coordinates of two vertices in f respectively
  • L e represents the surface edge loss, which is used to describe the length difference between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  • f represent the triangular surface of the predicted human body mesh
  • m i and m j represent the two vertex coordinates in f respectively
  • m i * and m j * represent the vertex coordinates corresponding to m i and m j in the real human body mesh respectively
  • the surface edge loss L e is expressed as follows:
  • FIG. 7 shows a working frame example of the three-dimensional reconstruction method in the embodiment of the present application by taking the human body as an example to be reconstructed.
  • the working framework consists of two parts, a convolutional neural network-based encoder and a graph convolutional neural network-based human 3D vertex regressor.
  • the original image of the human body is obtained.
  • the original image is used as the initial input, and the partial image is obtained after preprocessing.
  • the partial image is encoded into a set of feature vectors through an encoder based on a convolutional neural network.
  • the set of feature vectors are fused and spliced with the grid vertex position information in the preset human grid graph to form a feature graph as the input of the graph convolutional neural network, and finally the graph convolutional neural network will return a new set of network
  • the position information of the vertices of the lattice makes it conform to the two-dimensional observation of the human body in the original image, and completes the task of three-dimensional reconstruction of the human body.
  • feature extraction is first performed on the image of the object to be reconstructed, and the feature vector used to characterize the shape feature information of the object to be reconstructed is obtained, and then the feature vector is combined with the preset template for the object to be reconstructed Combined to generate a feature map, and finally input the feature map into the trained graph convolutional neural network, so as to obtain the 3D reconstruction result of the object to be reconstructed.
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • an embodiment of the present application further provides a three-dimensional reconstruction device.
  • the three-dimensional reconstruction device 800 includes:
  • the extraction module 801 is configured to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to represent the shape feature information of the object to be reconstructed;
  • the generation module 802 is configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
  • the reconstruction module 803 is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
  • the graph convolutional neural network includes N functional modules connected in series;
  • the input of the first above-mentioned functional module is the input of the above-mentioned graph convolutional neural network
  • the output of the Nth above-mentioned functional module is the output of the above-mentioned graph convolutional neural network
  • the above-mentioned N is an integer greater than 2;
  • the above functional modules include a convolution unit, a normalization unit and an activation function unit.
  • the first above-mentioned functional module includes at least three specified structures, and the above-mentioned at least three specified structures are connected in sequence, and the above-mentioned specified structure includes the above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in sequence;
  • the input of the first specified structure is the input of the first above-mentioned function module
  • the residual between the output of the last specified structure and the input of the first above-mentioned function module is the first The output of the above function modules.
  • the i-th above-mentioned functional module includes at least two specified structures, and the above-mentioned at least two specified structures are connected in series, and the above-mentioned specified structure includes the above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series, and the above-mentioned i is an integer greater than 1 and less than N;
  • the input of the first specified structure is the output of the i-1th above-mentioned functional module
  • the residual between the output of the last specified structure and the output of the i-1th above-mentioned functional module is the output of the i-th above-mentioned functional module.
  • the Nth above-mentioned functional module includes the above-mentioned convolution unit, the above-mentioned normalization unit, the above-mentioned activation function unit, and the above-mentioned convolution unit connected in series;
  • the input of the first above-mentioned convolution unit is the output of the N-1th above-mentioned functional module
  • the output of the second above-mentioned convolutional unit is the output of the N-1th above-mentioned functional module
  • the residual between is the output of the Nth above-mentioned functional module.
  • the above convolution unit is a Chebyshev convolution unit.
  • the above-mentioned object to be reconstructed is a human body
  • the above-mentioned preset template is a human body grid map
  • the above-mentioned generating module 802 includes:
  • a construction unit configured to construct a graph structure in a preset format based on the above-mentioned human body mesh graph, where the graph structure includes the vertex information of the above-mentioned human body mesh graph;
  • the splicing unit is configured to fuse and splice the above-mentioned feature vectors and the above-mentioned graph structure to obtain the above-mentioned feature map.
  • the above three-dimensional reconstruction device 800 also includes:
  • the above mesh loss is used to describe the position difference between the real human body mesh and the predicted human body mesh;
  • the above three-dimensional joint loss is used to describe the position difference between the real human three-dimensional joints and the predicted human three-dimensional joints;
  • the surface normal loss described above is used to describe the angular difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh;
  • the surface edge loss described above is used to describe the difference in length between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  • the above extraction module 801 includes:
  • a segmentation unit configured to segment the image based on the object to be reconstructed to obtain a partial image
  • an adjustment unit configured to adjust the size of the partial image to a preset size
  • the extraction unit is configured to perform feature extraction on the above-mentioned partial image after size adjustment by using an encoder using a convolutional neural network to obtain the above-mentioned feature vector.
  • feature extraction is first performed on the image of the object to be reconstructed to obtain the feature vector used to characterize the shape feature information of the object to be reconstructed, and then the feature vector is combined with the preset object for the object to be reconstructed
  • the templates are combined to generate a feature map, and finally the feature map is input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • an embodiment of the present application further provides an electronic device.
  • the electronic device 9 in the embodiment of the present application includes: a memory 901, one or more processors 902 (only one is shown in Fig. 9 ) and a computer stored on the memory 901 and operable on the processor program.
  • the memory 901 is used to store software programs and units
  • the processor 902 executes various functional applications and diagnoses by running the software programs and units stored in the memory 901 to obtain resources corresponding to the above preset events.
  • the processor 902 implements the following steps by running the above-mentioned computer program stored in the memory 901:
  • the above-mentioned graph convolutional neural network includes N series-connected functional modules
  • the input of the first above-mentioned functional module is the input of the above-mentioned graph convolutional neural network
  • the output of the Nth above-mentioned functional module is the output of the above-mentioned graph convolutional neural network
  • the above-mentioned N is an integer greater than 2;
  • the above functional modules include a convolution unit, a normalization unit and an activation function unit.
  • the first above-mentioned functional module includes at least three specified structures, and the at least three specified structures are connected in sequence, and the above-mentioned specified structures include sequentially The above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series;
  • the input of the first specified structure is the input of the first above-mentioned function module
  • the residual between the output of the last specified structure and the input of the first above-mentioned function module is the first The output of the above function modules.
  • the i-th functional module includes at least two specified structures, the at least two specified structures are connected in sequence, and the specified structures include sequentially The above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series, the above-mentioned i is an integer greater than 1 and less than N;
  • the input of the first specified structure is the output of the i-1th above-mentioned functional module
  • the residual between the output of the last specified structure and the output of the i-1th above-mentioned functional module is the output of the i-th above-mentioned functional module.
  • the Nth functional module includes the above-mentioned convolution unit, the above-mentioned normalization unit, the above-mentioned activation function unit and the above-mentioned convolution unit;
  • the input of the first above-mentioned convolution unit is the output of the N-1th above-mentioned functional module
  • the output of the second above-mentioned convolutional unit is the output of the N-1th above-mentioned functional module
  • the residual between is the output of the Nth above-mentioned functional module.
  • the above convolution unit is a Chebyshev convolution unit.
  • the preset template is a human body mesh;
  • the preset template of the object to be reconstructed generates a feature map, including:
  • the above feature vector is fused and spliced with the above graph structure to obtain the above feature map.
  • the processor 902 further implements the following steps when running the above computer program stored in the memory 901:
  • the total loss of the above-mentioned graph convolutional neural network is calculated based on grid loss, three-dimensional joint loss, surface normal loss and surface edge loss;
  • the above mesh loss is used to describe the position difference between the real human body mesh and the predicted human body mesh;
  • the above three-dimensional joint loss is used to describe the position difference between the real human three-dimensional joints and the predicted human three-dimensional joints;
  • the surface normal loss described above is used to describe the angular difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh;
  • the surface edge loss described above is used to describe the difference in length between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  • feature extraction is performed on the image of the object to be reconstructed to obtain a feature vector, including:
  • the above-mentioned feature vector is obtained by performing feature extraction on the above-mentioned partial image after size adjustment by using an encoder of a convolutional neural network.
  • the so-called processor 902 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 901 may include read-only memory and random-access memory, and provides instructions and data to the processor 902 . Part or all of the memory 901 may also include non-volatile random access memory. For example, the memory 901 may also store information of device categories.
  • feature extraction is first performed on the image of the object to be reconstructed to obtain the feature vector used to characterize the shape feature information of the object to be reconstructed, and then the feature vector will be combined with the predicted object for the object to be reconstructed.
  • the templates are combined to generate a feature map, and finally the feature map is input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • the disclosed devices and methods can be implemented in other ways.
  • the system embodiments described above are only illustrative.
  • the division of the above-mentioned modules or units is only a logical function division.
  • multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing associated hardware through computer programs.
  • the above-mentioned computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized.
  • the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer-readable memory, read-only memory (ROM, Read-Only Memory ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunication signal
  • software distribution medium etc.
  • the content contained in the above-mentioned computer-readable storage media can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.
  • computer-readable storage media The medium does not include electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

La présente demande concerne le domaine technique du traitement d'images, et en particulier, divulgue un procédé de reconstruction tridimensionnelle, un appareil de reconstruction tridimensionnelle, un dispositif électronique et un support de stockage lisible par ordinateur. Le procédé de reconstruction tridimensionnelle consiste à : réaliser une extraction de caractéristiques sur une image d'un objet à reconstruire pour obtenir un vecteur de caractéristiques, le vecteur de caractéristiques étant utilisé pour représenter des informations de caractéristiques de forme dudit objet ; générer une carte de caractéristiques en fonction du vecteur de caractéristiques et d'un modèle prédéfini pour ledit objet, le modèle prédéfini étant utilisé pour représenter des informations de structure tridimensionnelle dudit objet ; et entrer la carte de caractéristiques dans un réseau neuronal convolutionnel de graphe entraîné afin d'obtenir un résultat de reconstruction tridimensionnelle dudit objet. Au moyen de la solution de la présente demande, la stabilité de la reconstruction tridimensionnelle peut être améliorée.
PCT/CN2021/113308 2021-08-18 2021-08-18 Procédé et appareil de reconstruction tridimensionnelle, dispositif électronique et support de stockage lisible WO2023019478A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113308 WO2023019478A1 (fr) 2021-08-18 2021-08-18 Procédé et appareil de reconstruction tridimensionnelle, dispositif électronique et support de stockage lisible

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113308 WO2023019478A1 (fr) 2021-08-18 2021-08-18 Procédé et appareil de reconstruction tridimensionnelle, dispositif électronique et support de stockage lisible

Publications (1)

Publication Number Publication Date
WO2023019478A1 true WO2023019478A1 (fr) 2023-02-23

Family

ID=85239320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113308 WO2023019478A1 (fr) 2021-08-18 2021-08-18 Procédé et appareil de reconstruction tridimensionnelle, dispositif électronique et support de stockage lisible

Country Status (1)

Country Link
WO (1) WO2023019478A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428493A (zh) * 2019-07-12 2019-11-08 清华大学 基于网格形变的单图像人体三维重建方法及系统
CN110458957A (zh) * 2019-07-31 2019-11-15 浙江工业大学 一种基于神经网络的图像三维模型构建方法及装置
US20200126297A1 (en) * 2018-10-17 2020-04-23 Midea Group Co., Ltd. System and method for generating acupuncture points on reconstructed 3d human body model for physical therapy
CN111369681A (zh) * 2020-03-02 2020-07-03 腾讯科技(深圳)有限公司 三维模型的重构方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200126297A1 (en) * 2018-10-17 2020-04-23 Midea Group Co., Ltd. System and method for generating acupuncture points on reconstructed 3d human body model for physical therapy
CN110428493A (zh) * 2019-07-12 2019-11-08 清华大学 基于网格形变的单图像人体三维重建方法及系统
CN110458957A (zh) * 2019-07-31 2019-11-15 浙江工业大学 一种基于神经网络的图像三维模型构建方法及装置
CN111369681A (zh) * 2020-03-02 2020-07-03 腾讯科技(深圳)有限公司 三维模型的重构方法、装置、设备及存储介质

Similar Documents

Publication Publication Date Title
CN110443842B (zh) 基于视角融合的深度图预测方法
CN110458957B (zh) 一种基于神经网络的图像三维模型构建方法及装置
US11715258B2 (en) Method for reconstructing a 3D object based on dynamic graph network
CN110619676A (zh) 一种基于神经网络的端到端的三维人脸重建方法
CN113781659A (zh) 一种三维重建方法、装置、电子设备及可读存储介质
CN110599528A (zh) 一种基于神经网络的无监督三维医学图像配准方法及系统
CN110349087B (zh) 基于适应性卷积的rgb-d图像高质量网格生成方法
CN112862949B (zh) 基于多视图的物体3d形状重建方法
US11961266B2 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN112767467B (zh) 一种基于自监督深度学习的双图深度估计方法
CN112149563A (zh) 一种注意力机制人体图像关键点姿态估计方法及系统
CN111950477A (zh) 一种基于视频监督的单图像三维人脸重建方法
CN113903028A (zh) 一种目标检测方法及电子设备
CN113077545B (zh) 一种基于图卷积的从图像中重建着装人体模型的方法
CN115984494A (zh) 一种基于深度学习的月面导航影像三维地形重建方法
CN112509106A (zh) 文档图片展平方法、装置以及设备
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN111654621B (zh) 一种基于卷积神经网络模型的双焦相机连续数字变焦方法
CN113888697A (zh) 一种双手交互状态下的三维重建方法
CN115482268A (zh) 一种基于散斑匹配网络的高精度三维形貌测量方法与系统
CN113989441B (zh) 基于单张人脸图像的三维漫画模型自动生成方法及系统
CN116012432A (zh) 立体全景图像的生成方法、装置和计算机设备
CN117576312A (zh) 手部模型构建方法、装置以及计算机设备
JP2024510230A (ja) 顔表情、身体ポーズ形状及び衣服パフォーマンスキャプチャのための暗黙的微分可能レンダラーを用いたマルチビューニューラル人間予測
CN116934972A (zh) 一种基于双流网络的三维人体重建方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21953725

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21953725

Country of ref document: EP

Kind code of ref document: A1