WO2022233137A1 - 三维网格重建方法、装置、设备及存储介质 - Google Patents

三维网格重建方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022233137A1
WO2022233137A1 PCT/CN2021/137703 CN2021137703W WO2022233137A1 WO 2022233137 A1 WO2022233137 A1 WO 2022233137A1 CN 2021137703 W CN2021137703 W CN 2021137703W WO 2022233137 A1 WO2022233137 A1 WO 2022233137A1
Authority
WO
WIPO (PCT)
Prior art keywords
skeleton
mesh
dimensional
bones
error
Prior art date
Application number
PCT/CN2021/137703
Other languages
English (en)
French (fr)
Inventor
乔宇
栾天宇
王亚立
张钧皓
王喆
周志鹏
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2022233137A1 publication Critical patent/WO2022233137A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts

Definitions

  • the present application belongs to the technical field of three-dimensional reconstruction, and in particular, relates to a three-dimensional mesh reconstruction method, device, equipment and storage medium.
  • Three-dimensional reconstruction refers to the establishment of three-dimensional models suitable for computer representation and processing of three-dimensional objects.
  • a 3D model is a polygonal representation of an object, usually displayed on a computer or other video device.
  • a 3D model is also called a 3D mesh model.
  • the surface of the 3D model is represented by a 3D mesh, and the 3D mesh and its inner area are the corresponding 3D models. Therefore, in the process of reconstructing the 3D model, it is the key to reconstruct the 3D mesh on the surface of the 3D model.
  • the current 3D mesh reconstruction methods are usually divided into two ideas: the first idea is to extract the local image features of the image, and generate a 3D mesh based on the local image features of the image. This idea can make the generated 3D mesh more accurate in local detail and depth.
  • the second idea is to extract the global image features of the image, and generate a 3D grid based on the global image features of the image. This idea can make the overall accuracy and robustness of the generated 3D mesh higher.
  • the embodiments of the present application provide a three-dimensional mesh reconstruction method, apparatus, device, and storage medium, which can solve the problem of low three-dimensional mesh reconstruction effect in the related art.
  • an embodiment of the present application provides a three-dimensional mesh reconstruction method, including:
  • the second target image is an image containing the target, and the second skeleton is used to represent the internal structure of the target;
  • the first three-dimensional mesh is adjusted to obtain a second three-dimensional mesh corresponding to the second skeleton.
  • generating the first three-dimensional grid of the target according to the global image feature of at least one first target image includes:
  • the space occupancy information is used to indicate the probability that each point in the space is occupied by the target
  • a three-dimensional grid of the target is generated, and the generated three-dimensional grid is used as the first three-dimensional grid.
  • generating the second skeleton of the target according to local image features of at least one second target image includes:
  • a skeleton node heatmap corresponding to each second target image is generated, and the skeleton node heatmap is used to indicate the skeleton of the target.
  • a three-dimensional skeleton of the target is generated, and the generated three-dimensional skeleton is used as the second skeleton.
  • generating a two-dimensional skeleton corresponding to each second target image according to the heat map of the skeleton node corresponding to each second target image including:
  • a two-dimensional skeleton corresponding to each second target image is generated.
  • the skeleton error between the second skeleton and the first skeleton includes an angular error and a translation error between each pair of bones in the first skeleton and the second skeleton.
  • nonlinear extension error refers to the error caused by the different lengths of each pair of bones.
  • a nonlinear extension error between each pair of bones is determined according to the angular error and translation error between each pair of bones.
  • the first three-dimensional mesh is adjusted to obtain a second three-dimensional mesh corresponding to the second skeleton, including :
  • the skeleton error between the second skeleton and the first skeleton determine the spatial mapping relationship between the second 3D mesh corresponding to the second skeleton and the first 3D mesh;
  • the first three-dimensional grid is spatially transformed to obtain the second three-dimensional grid.
  • the skeleton error between the second skeleton and the first skeleton includes an angular error and a translation error between each pair of bones in the first skeleton and the second skeleton. and nonlinear extension error;
  • the determining, according to the skeleton error between the second skeleton and the first skeleton, the spatial mapping relationship between the second 3D mesh corresponding to the second skeleton and the first 3D mesh includes:
  • the mesh vertices on the second three-dimensional mesh are at the spatial mapping relationship between the components on each pair of bones in the plurality of pairs of bones and the mesh vertices on the first three-dimensional mesh;
  • the first three-dimensional mesh is The components of the mesh vertices on the 2D and 3D meshes on the multiple pairs of bones are added to obtain the difference between the mesh vertices on the second 3D mesh and the mesh vertices on the first 3D mesh. The spatial mapping relationship between them.
  • the second three-dimensional mesh is determined according to the angle error, translation error and nonlinear extension error between each pair of bones in the first skeleton and the corresponding pairs of bones in the second skeleton.
  • the spatial mapping relationship between the components of the mesh vertices on the plurality of pairs of bones on each pair of bones and the mesh vertices on the first three-dimensional mesh including:
  • ⁇ (i) is the angle error between the i-th pair of bones
  • T (i) is the translation error between the i-th pair of bones
  • ⁇ (i) is the nonlinear extension error between the i-th pair of bones
  • W j,i is the weight of the j-th mesh vertex on the first 3D mesh and the i-th pair of bones
  • i and j are both positive integers.
  • the components of the mesh vertices on the second three-dimensional mesh on each pair of bones in the plurality of pairs of bones and the mesh vertices on the first three-dimensional mesh relationship are summed to obtain the mesh vertices on the second three-dimensional mesh and the first three-dimensional mesh.
  • the spatial mapping relationship between mesh vertices on including:
  • the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid includes grid vertices on the second three-dimensional grid and grids on the first three-dimensional grid. Spatial mapping relationship between vertices;
  • Performing spatial transformation on the first three-dimensional grid according to the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid to obtain the second three-dimensional grid including:
  • a three-dimensional mesh reconstruction device comprising:
  • a first generation module configured to generate a first three-dimensional grid of the target according to the global image feature of at least one first target image, where the first target image is an image containing the target;
  • an extraction module configured to extract the skeleton of the first three-dimensional mesh to obtain a first skeleton, and the first skeleton is used to characterize the internal structure of the first three-dimensional mesh;
  • the second generation module is configured to generate a second skeleton according to local image features of at least one second target image, where the second target image is an image including the target, and the second skeleton is used to represent the characteristics of the target. internal structure;
  • An adjustment module configured to adjust the first three-dimensional mesh according to the skeleton error between the second skeleton and the first skeleton, to obtain a second three-dimensional mesh corresponding to the second skeleton.
  • the first generation module is used to:
  • the space occupancy information is used to indicate the probability that each point in the space is occupied by the target
  • a three-dimensional grid of the target is generated, and the generated three-dimensional grid is used as the first three-dimensional grid.
  • the second generation module is used to:
  • a skeleton node heatmap corresponding to each second target image is generated, and the skeleton node heatmap is used to indicate the skeleton of the target.
  • a three-dimensional skeleton of the target is generated, and the generated three-dimensional skeleton is used as the second skeleton.
  • the second generation module is used for:
  • a two-dimensional skeleton corresponding to each second target image is generated.
  • the skeleton error between the second skeleton and the first skeleton includes an angular error and a translation error between each pair of bones in the first skeleton and the second skeleton.
  • nonlinear extension error refers to the error caused by the different lengths of each pair of bones.
  • the device further includes a determining module, and the determining module is used for:
  • a nonlinear extension error between each pair of bones is determined according to the angular error and translation error between each pair of bones.
  • the adjustment module includes:
  • a determining unit configured to determine the spatial mapping relationship between the second 3D mesh corresponding to the second skeleton and the first 3D mesh according to the skeleton error between the second skeleton and the first skeleton ;
  • a transformation unit configured to perform spatial transformation on the first three-dimensional grid according to the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid to obtain the second three-dimensional grid.
  • the skeleton error between the second skeleton and the first skeleton includes an angular error and a translation error between each pair of bones in the first skeleton and the second skeleton. and nonlinear extension errors; determine the elements for:
  • the mesh vertices on the second three-dimensional mesh are at the spatial mapping relationship between the components on each pair of bones in the plurality of pairs of bones and the mesh vertices on the first three-dimensional mesh;
  • the first three-dimensional mesh is The components of the mesh vertices on the 2D and 3D meshes on the multiple pairs of bones are added to obtain the difference between the mesh vertices on the second 3D mesh and the mesh vertices on the first 3D mesh. The spatial mapping relationship between them.
  • the determining unit is used for:
  • ⁇ (i) is the angle error between the i-th pair of bones
  • T (i) is the translation error between the i-th pair of bones
  • ⁇ (i) is the nonlinear extension error between the i-th pair of bones
  • W j,i is the weight of the j-th mesh vertex on the first 3D mesh and the i-th pair of bones
  • i and j are both positive integers.
  • the determining unit is used for:
  • the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid includes grid vertices on the second three-dimensional grid and grids on the first three-dimensional grid. Spatial mapping relationship between vertices; the adjustment module is used for:
  • an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements any one of the above-mentioned first aspect Methods.
  • an embodiment of the present application provides a computer program product, which, when the computer program product runs on a computer device, causes the computer device to execute the method described in any one of the foregoing first aspects.
  • a first three-dimensional grid of the target can be generated according to the global image feature of at least one first target image, and the first skeleton of the first three-dimensional grid can be extracted, and the generated first three-dimensional grid The overall accuracy and robustness of the first skeleton are higher.
  • a second skeleton can be generated according to the local image features of the at least one second target image, and the generated second skeleton has higher accuracy in local details and depth.
  • the higher first 3D grid is tuned, so that the second 3D grid obtained after tuning has a certain overall accuracy and robustness, as well as certain local details and depth accuracy. , a certain balance has been achieved between the overall accuracy and robustness, and the accuracy of local details and depth, which further improves the 3D mesh reconstruction results and improves the 3D mesh reconstruction effect.
  • FIG. 1 is a flowchart of a three-dimensional mesh reconstruction method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a logical structure of a three-dimensional grid reconstruction system provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a method for reconstructing a three-dimensional model of a human body provided by an embodiment of the present application
  • FIG. 4 is a structural block diagram of a three-dimensional mesh reconstruction device provided by an embodiment of the present application.
  • FIG. 5 is a structural block diagram of a computer device provided by an embodiment of the present application.
  • the three-dimensional mesh reconstruction method provided by the embodiment of the present application is applied to reconstruct the three-dimensional mesh of the target.
  • the target is the object to be reconstructed, and the target can be preset or specified manually.
  • targets can include living things, non-living things, or scenes.
  • the living thing can be a human body or an animal, etc., or a tissue or an organ in the living thing; the non-living thing can be a vehicle, an obstacle, and the like.
  • the goal can be to display an entity in the world, or it can be an imaginary object. This embodiment of the present application does not limit the specific type of the target to be reconstructed.
  • the 3D mesh reconstruction method provided by the embodiments of the present application can also be applied to any application fields that require a higher-precision 3D mesh, such as virtual reality games, smart cities, navigation, or automatic driving.
  • the three-dimensional mesh reconstruction method provided by the embodiment of the present application can reconstruct the three-dimensional mesh of the target based on at least one target image.
  • the at least one target image may be at least one video frame in the video, and the embodiment of the present application may reconstruct the three-dimensional mesh of the target based on the video.
  • the three-dimensional mesh reconstruction method provided by the embodiments of the present application can be applied to computer equipment.
  • the computer device is installed with image processing software, and the image processing software can implement the three-dimensional mesh reconstruction method provided by the embodiments of the present application.
  • the image processing software can process at least one video frame in the video to obtain a three-dimensional grid with high accuracy.
  • the computer device may be a terminal device or a server, and the terminal device may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop, a super A mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), etc.
  • the embodiments of the present application do not impose any restrictions on the specific type of the computer device.
  • FIG. 1 is a flowchart of a three-dimensional mesh reconstruction method provided by an embodiment of the present application. The method is applied to a computer device. As shown in FIG. 1 , the method includes the following steps:
  • Step 101 Generate a first three-dimensional model of the target according to the global image feature of at least one first target image, where the first target image is an image containing the target.
  • the target is the object to be reconstructed.
  • the at least one first target image includes one or more first target images.
  • the at least one first target image can be obtained from locally stored data, can be sent to other devices, or can be downloaded from the network.
  • the embodiment of the present application does not limit the acquisition method of the at least one first target image. .
  • the at least one first target image may also be at least one video frame in the first video, for example, at least one video frame may be obtained from the first video, and the at least one video frame may be used as the at least one first target image.
  • the at least one video frame may be a continuous video frame or a discontinuous video frame.
  • the at least one first target image may also be other types of images including targets, which are not limited in this embodiment of the present application.
  • the global image feature of the first target image refers to a feature on the entire image that can characterize the first target image, and is used to describe the first target image or the overall characteristics of the target in the first target image.
  • global image features may include one or more of features such as color, shape, texture, and structure.
  • At least one first target image may be acquired first, and feature extraction is performed on each first target image in the at least one first target image to obtain a global image feature of each first target image. Then, the three-dimensional grid of the target is reconstructed according to the global image feature of the at least one first target image to obtain a first three-dimensional grid.
  • the first feature extraction model may be used to perform feature extraction on the first target image to obtain the global image feature of the first target image.
  • the first feature extraction model is used to extract global image features of the image.
  • the first feature extraction model may be a neural network model, such as a CNN (Convolutional Neural Networks, convolutional neural network) model or a ResNet (residual convolutional neural network) model.
  • the operation of generating the first three-dimensional model of the target may include the following steps:
  • the space occupancy information is used to indicate the probability that each point in the space is occupied by the target.
  • the probability that each point in the space is occupied by the target is between 0 and 1.
  • the probability that a point in the space is occupied by the target can be 0, 0.3, 0.5, 0.8, or 1, etc.
  • a classification network model may be used to process the global image feature of at least one first target image to obtain the space occupation information of the target.
  • the classification network model is used to generate space occupancy information of the object to be reconstructed according to the global image feature of at least one image.
  • the classification network model may be a neural network model, such as a CNN model or a dense network model.
  • the estimation of the occupancy probability of all points within the effective spatial range can also be achieved by dense sampling.
  • the space occupancy information of the target it can be determined which points in the space are occupied by the target and which points are not occupied by the target. Therefore, points occupied by the target in the space can be determined according to the space occupancy information of the target, and then a 3D mesh (first 3D mesh) of the surface of the 3D model of the target can be generated according to the points occupied by the target in the space.
  • a point in the space with a probability of being occupied by the target with a specified probability may be determined according to the space occupancy information of the target, and then a first three-dimensional grid may be generated according to a point in the space with a probability of being occupied by the target with a specified probability.
  • the specified probability can be preset, for example, the specified probability is 0.5.
  • the points in the space occupied by the target may be determined according to the space occupancy information of the target
  • the first 3D model of the target may be generated according to the points occupied by the target in the space
  • the 3D mesh on the surface of the first 3D model may be determined as the third A three-dimensional grid.
  • the first three-dimensional grid may be generated by a gridding algorithm according to the space occupancy information of the target.
  • the gridding algorithm may be a cube marching algorithm or the like.
  • a meshing algorithm can be used to determine the mapping relationship between the space occupancy information of the target and each vertex in the first three-dimensional mesh, and then generate the first three-dimensional mesh according to the mapping relationship between the space occupation information of the target and each vertex in the first three-dimensional mesh.
  • a three-dimensional grid can be used to determine the mapping relationship between the space occupancy information of the target and each vertex in the first three-dimensional mesh, and then generate the first three-dimensional mesh according to the mapping relationship between the space occupation information of the target and each vertex in the first three-dimensional mesh.
  • color space information of the target may also be determined according to the global image feature of the at least one first target image, where the color space information is used to indicate the color of the target surface. Then, according to the color space information of the target, the color of the surface of the first three-dimensional mesh is generated. For example, the color of the surface of the first three-dimensional mesh may be generated by a meshing algorithm according to the color space information of the target.
  • the color space may be an RGB color space.
  • the gridding algorithm is not limited to the occupancy of the position, and the same generalization can be done for the color space. Therefore, the color space information can be estimated by an algorithm that estimates the space occupancy information similar, and then gridding is adopted according to the color space information. The algorithm generates the color of the surface of the first three-dimensional mesh.
  • a first three-dimensional mesh of the target may be generated from an initial three-dimensional reconstruction model according to at least one first target image.
  • the initial 3D reconstruction model includes a first feature extraction network and a first fully connected network, the first feature extraction network is used to extract global image features of the image, and the first fully connected network is used to generate a 3D grid according to the extracted global image features .
  • At least one first target image can be used as the input of the initial 3D reconstruction model
  • the global image features of the at least one first target image can be extracted through the first feature extraction network extraction network
  • the extracted global image features can be used as the first fully connected network. input of.
  • a first three-dimensional grid is generated according to the global image feature of the at least one first target image.
  • the initial three-dimensional reconstruction model can be obtained by training according to the first sample data in advance.
  • the first sample data may include a sample target image and a three-dimensional grid corresponding to the sample target image.
  • Step 102 Extract the skeleton of the first three-dimensional mesh to obtain the first skeleton, and the first skeleton is used to represent the internal structure of the first three-dimensional mesh.
  • the first skeleton is a tree-like structure that is generated inside the first three-dimensional model corresponding to the first three-dimensional grid and can represent the structural information of the entire first three-dimensional model.
  • extracting the skeleton of the first three-dimensional mesh may include the following implementations:
  • the skeleton of the first 3D mesh is determined according to the mesh vertices on the first 3D mesh and the mapping relationship between the mesh vertices on the 3D mesh and the skeleton of the 3D mesh, and the The determined skeleton is used as the first skeleton.
  • the skeleton corresponding to the mesh vertices on the first 3D mesh can be determined according to the mapping relationship between the mesh vertices of the 3D mesh on the surface of the 3D model and the skeleton of the 3D model, and the determined skeleton is used as the first skeleton. skeleton.
  • the mapping relationship may be obtained in advance, or may be obtained by learning according to the second sample data.
  • the second sample data may include a plurality of sample three-dimensional grids and a sample skeleton corresponding to each sample three-dimensional grid.
  • the sample skeleton corresponding to each sample 3D mesh can be obtained by manual annotation.
  • the skeleton of the first three-dimensional mesh is extracted through the skeleton extraction model to obtain the first skeleton.
  • the first three-dimensional mesh data can be used as the input of the skeleton extraction model, and the first skeleton can be determined by the skeleton extraction model.
  • the skeleton extraction model is used to extract the skeleton of the 3D mesh.
  • the skeleton extraction model can be obtained by training according to the third sample data.
  • the third sample data may include a plurality of sample three-dimensional grids and a sample skeleton corresponding to each sample three-dimensional grid.
  • the sample skeleton corresponding to each sample 3D mesh can be obtained by manual annotation.
  • a skeleton estimation algorithm is used to determine the skeleton of the first three-dimensional mesh to obtain the first skeleton.
  • the skeleton estimation algorithm is used to determine the skeleton of the three-dimensional mesh, and may specifically be a mid-axis surface extraction algorithm or the like.
  • a skeleton estimation algorithm can be used to determine the skeleton of the first three-dimensional mesh according to the preset topology structure, and the first skeleton can be obtained.
  • the preset topology structure may be preset, and may be set according to the topology structure of the target to be reconstructed, and the topology structure of the target may be determined according to the structural characteristics of the target.
  • the skeleton estimation algorithm is directly used to determine the skeleton of the first three-dimensional mesh, the algorithm is relatively simple, and the skeleton extraction efficiency is high.
  • Step 103 Generate a second skeleton according to local image features of at least one second target image, the second target image is an image containing the target, and the second skeleton is used to represent the internal structure of the target.
  • the at least one second target image includes one or more second target images.
  • the at least one second target image can be obtained from locally stored data, can be sent to other devices, or can be downloaded from the network.
  • the embodiment of the present application does not limit the acquisition method of the at least one second target image. .
  • the at least one second target image may also be at least one video frame in the second video, for example, at least one video frame may be obtained from the second video, and the at least one video frame may be used as the at least one second target image.
  • the at least one video frame may be a continuous video frame or a discontinuous video frame.
  • the at least one second target image may also be other types of images including targets, which are not limited in this embodiment of the present application.
  • the at least one second target image described in this embodiment of the present application may be the same as or different from the above at least one first target image, which is not limited in this embodiment of the present application.
  • the above-mentioned first video and the second video may be the same video, or may be different videos, which are not limited in this embodiment of the present application.
  • local image features are also called local features.
  • the local image feature of the second target image is a local expression of the image feature of the second target image, and is used to describe the local characteristic possessed by the second target image.
  • local image features Compared with global image features, local image features have the characteristics of rich content in the image, small correlation between features, and the disappearance of some features will not affect the detection and matching of other features in the case of occlusion.
  • a local image feature of at least one second target image may be obtained first, and feature extraction is performed on each second target image in the at least one second target image to obtain a local image of each second target image feature. Then, a second skeleton is generated according to the local image features of the at least one second target image.
  • the second feature extraction model may be used to perform feature extraction on the second target image to obtain local image features of the second target image.
  • the second feature extraction model is used to extract local image features of the image.
  • the second feature extraction model may be a neural network model, such as a CNN model or a HRNet (High Resolution Neural Network) model.
  • a three-dimensional skeleton of the target may also be generated by using the first skeleton model according to at least one second target image.
  • the first skeleton model may include a second feature extraction network and a skeleton generation network, the second feature extraction network is used to extract local image features of the image, and the skeleton generation network is used to generate a skeleton according to the extracted local image features.
  • At least one second target image can be used as the input of the first skeleton model
  • local image features of at least one second target image can be extracted through the second feature extraction network
  • the extracted local image features can be used as the input of the skeleton generation network.
  • a second skeleton of the target is generated according to local image features of at least one second target image.
  • the skeleton generation network may further include a 2D skeleton generation network and a 3D skeleton generation network, where the 2D skeleton generation network is configured to generate at least one 2D skeleton of the target according to the local image features of the at least one second target image, at least A 2D skeleton is used as the input to the 3D skeleton generation network.
  • the 3D skeleton generation network is used to generate the 3D skeleton of the target from at least one 2D skeleton.
  • the first skeleton model can be obtained by training according to the fourth sample data in advance.
  • the fourth sample data may include a sample target image and a sample skeleton corresponding to the sample target image.
  • the operation of generating the second skeleton according to the local image features of the at least one second target image includes the following steps:
  • the skeleton node heatmap corresponding to each second target image is used to indicate the probability of each node in the target skeleton appearing at different positions in each second target heatmap.
  • the nodes in the skeleton refer to the nodes of the skeleton tree structure, including the root node, child nodes and leaf nodes of the skeleton tree structure.
  • the skeleton node heatmap includes heatmaps of multiple nodes in the skeleton, and the heatmap of each node is used to indicate the probability of each node appearing at different positions in the corresponding second target heatmap. Indicates that the node is more likely to appear at this location.
  • the skeleton node heatmap can be used to characterize the second skeleton. Characterizing the skeleton through the skeleton node heatmap has the following advantages: 1) The skeleton node heatmap is a continuous function, which can be well adapted to the learning of the deep learning network model, making the generation of the second skeleton more robust and stable . 2) The skeleton node heatmap can label the positions of skeleton nodes well. 3) The heat map of skeleton nodes can not only characterize the positions of skeleton nodes, but also represent the error of skeleton node estimation, which is convenient for generating more accurate skeletons.
  • the skeleton node corresponding to each second target image may first be determined according to the heatmap of the skeleton node corresponding to each second target image. Then, according to the skeleton node corresponding to each second target image and the preset topology structure, a two-dimensional skeleton corresponding to each second target image is generated.
  • the preset topology structure may be preset, and may be set according to the topology structure of the target to be reconstructed, and the topology structure of the target may be determined according to the structural characteristics of the target.
  • the position of the skeleton node in each second target image with the highest probability of occurrence can be determined, and the position with the highest probability of occurrence of the skeleton node can be determined as the position of the skeleton node with the highest probability of occurrence.
  • the positions of the skeleton nodes corresponding to each second target image are connected to generate a two-dimensional skeleton corresponding to each second target image.
  • the three-dimensional skeleton includes two-dimensional position information and depth information of the skeleton. That is, the 3D skeleton of the target includes 2D skeletons of different depths.
  • the three-dimensional skeleton of the target may be generated according to a two-dimensional skeleton corresponding to a second target image and prior information of the target.
  • the prior information of the target includes depth information of the target, and of course, other information may also be included.
  • the prior information of the target can be obtained by learning in advance according to the depth information of the three-dimensional skeleton of multiple samples of the target.
  • the depth information of the target may include spatial geometric measurement information of the target at different angles.
  • two-dimensional skeletons corresponding to multiple second target images may be fused to obtain a three-dimensional skeleton including rich two-dimensional position information and depth information.
  • the three-dimensional skeleton of the target may be generated through the second skeleton model according to the two-dimensional skeleton corresponding to the at least one second target image.
  • the second skeleton model is used to generate a three-dimensional skeleton according to at least one two-dimensional skeleton.
  • the second skeleton model may be obtained by pre-training according to fifth sample data, and the fifth sample data may include at least one sample two-dimensional skeleton and a corresponding three-dimensional sample skeleton.
  • the network in the second skeleton model may generate a network for the above-mentioned three-dimensional skeleton.
  • the target to be reconstructed in this embodiment of the present application may be a rigid body or a non-rigid body. If the target to be reconstructed is a non-rigid body, it is also possible to generate a non-rigid target according to the two-dimensional skeleton corresponding to the multiple second target images, the continuous information of the multiple second target images, and the prior information of the target to obtain the three-dimensional skeleton and the change information of the generated 3D skeleton in each second target image.
  • the multiple second target images may be multiple consecutive video frames.
  • the skeleton error between the second skeleton and the first skeleton can also be determined, so that the difference between the second skeleton and the first skeleton can be , and the first 3D mesh to generate the second 3D mesh of the target.
  • the skeleton error between the second skeleton and the first skeleton may include one or more of angle error, translation error and nonlinear extension error, and the nonlinear extension error refers to the error caused by the different lengths of the corresponding bones in the skeleton .
  • the bone in the skeleton refers to the connection part between two nodes connected to each other in the skeleton.
  • a bone includes a parent node and a child node, which is the part of the connection between the parent node and the child node in the bone.
  • the corresponding bones in the first skeleton and the second skeleton refer to a pair of bones with the same topology in the first skeleton and the second skeleton.
  • the skeleton error between the second skeleton and the first skeleton may include angle errors, translation errors and nonlinear extension errors between the first skeleton and the corresponding bones in the second skeleton, for example, including the first skeleton and the second skeleton.
  • the angle error, translation error and nonlinear extension error between each pair of bones in the corresponding pairs of bones in the two skeletons may include angle errors, translation errors and nonlinear extension errors between each pair of bones in the corresponding pairs of bones in the two skeletons.
  • determining the skeleton error between the second skeleton and the first skeleton may include the following steps:
  • the angle error between each pair of bones can be represented by a rotation matrix. That is, rotating the first bone according to the rotation matrix can make the angle error between the rotated first bone and the second bone smaller.
  • the first bone is any bone in the first skeleton
  • the second bone is a bone corresponding to the first bone in the second skeleton.
  • the rotation angle between each pair of bones may be determined first, and then each pair of bones may be determined according to the rotation angle between each pair of bones rotation matrix between.
  • the rotation matrix between each pair of bones can be determined by the Rodrigues formula according to the rotation angle between each pair of bones.
  • the rotation angle between each pair of bones can be determined by the following formula 1):
  • refers to the rotation angle between the first bone and the second bone
  • b ori refers to the direction vector of the first bone
  • b target refers to the direction vector of the second bone.
  • b ori is the position difference between the parent node and the child node in the first bone.
  • b target is the position difference between the parent node and the child node in the second bone.
  • the rotation matrix between each pair of bones can be determined by the following formula 2) according to the rotation angle between each pair of bones:
  • cos
  • refers to the rotation matrix between the first bone and the second bone
  • refers to the first bone the rotation angle between the bone and the second bone
  • ⁇ T is the transpose of ⁇
  • is the antisymmetric matrix corresponding to ⁇ .
  • the translation error between each pair of bones can be represented by a translation vector. That is, by translating the first bone according to the translation vector, the position error between the translated first bone and the second bone can be made smaller.
  • the translation error between each pair of bones can be determined according to the angular error between each pair of bones by the following formula 3):
  • T refers to the translation error between the first bone and the second bone
  • refers to the rotation matrix between the first bone and the second bone
  • refers to the position of the parent node in the second bone
  • the nonlinear extension error between each pair of bones can be represented by nonlinear transformation. That is, performing nonlinear transformation on the first bone can make the position error between the transformed first bone and the second bone smaller.
  • Differences in bone length between each pair of bones will cause nonlinear extension errors.
  • the error caused by the different lengths of the bones is referred to as a nonlinear extension error.
  • a nonlinear transformation is designed in the embodiment of the present application to make up for it.
  • the first bone may be subjected to rigid body transformation according to the rotation matrix and translation vector determined above, and then the first bone and the second bone may be determined according to the position error between the first bone and the second bone after the rigid body transformation.
  • the position of the child node in the first bone can be rigidly transformed according to the rotation matrix and translation vector determined above to obtain the transformed position, and then the difference between the transformed position and the position of the byte point in the second bone can be determined.
  • the position error between the first bone and the second bone is determined as the nonlinear extension error between the first bone and the second bone.
  • the position of the child node in the first bone can be subjected to rigid body transformation by the following formula (4) to obtain the transformed position:
  • the nonlinear extension error between the first bone and the second bone is determined by the following formula 5):
  • refers to the nonlinear extension error between the first bone and the second bone, refers to the position of the child node in the second bone, means that according to the rotation matrix and the translation error pair The position obtained after the rigid body transformation.
  • mapping relationship between the child nodes in the corresponding bone can be established according to ⁇ , and the mapping relationship between the child nodes can be extended to any point on the bone to obtain the mapping relationship between any point in the corresponding bone.
  • the spatial mapping relationship between the first skeleton and the second skeleton can also be determined.
  • the first skeleton can be spatially transformed, and then the spatial error between the transformed first skeleton and the second skeleton can be determined, so as to determine the spatial error according to the determined spatial error. Verify the spatial mapping relationship between the first skeleton and the second skeleton.
  • the spatial mapping relationship between the first skeleton and the second skeleton can be determined by the following formula (6):
  • W is the weight parameter. Among them, W can be preset and obtained, and can also be obtained by learning.
  • the spatial error is less than the error threshold, and if it is less than the error threshold, it is determined that the verification is passed, and if not, it is determined that the verification is not passed.
  • the following step 104 is performed. If the verification fails, the spatial mapping relationship between the first skeleton and the second skeleton is adjusted so that the spatial error between the transformed first skeleton and the second skeleton is smaller than the error threshold. Then, according to the adjusted spatial mapping relationship, determine the skeleton error between the adjusted first skeleton and the second skeleton, so as to generate the first skeleton error according to the adjusted skeleton error between the first skeleton and the second skeleton and the first three-dimensional model. The second three-dimensional model corresponding to the two skeletons.
  • Step 104 Adjust the first three-dimensional mesh according to the skeleton error between the second skeleton and the first skeleton to obtain a second three-dimensional model corresponding to the second skeleton.
  • the first 3D mesh is generated according to the global image features of the target image, and the overall accuracy and robustness are high, and the first skeleton is the skeleton of the first 3D model, so it is also accurate on the whole. high performance and robustness.
  • the second skeleton is generated according to the local image features of the target image, and has high accuracy in local details and depth.
  • the global image features and local image features of the target image can be combined to improve the overall accuracy Tuning with the first 3D model with higher robustness makes the second 3D mesh obtained after tuning not only has certain overall accuracy and robustness, but also has certain local details and depth.
  • the accuracy of the 3D model is balanced, and the overall accuracy and robustness, as well as the accuracy of local details and depth, have achieved a certain balance, which further improves the 3D model reconstruction results and improves the 3D model reconstruction effect.
  • the spatial mapping relationship between the second 3D mesh and the first 3D mesh may be determined first according to the skeleton error between the second skeleton and the first skeleton. Then, according to the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid, the first three-dimensional model is spatially transformed to obtain a second three-dimensional model.
  • the spatial mapping relationship between the second three-dimensional mesh and the first three-dimensional mesh may include spaces between mesh vertices on the second three-dimensional mesh and mesh vertices on the first three-dimensional mesh Mapping relations.
  • the operation of spatially transforming the first three-dimensional grid may include: The spatial mapping relationship between the mesh vertices on the three-dimensional mesh is to transform the spatial positions of the mesh vertices of the first three-dimensional mesh to obtain the second three-dimensional mesh.
  • the spatial mapping relationship between the second skeleton and the first skeleton may be determined according to the skeleton error between the second skeleton and the first skeleton, and then based on the expansion of the linear blended skin, the second skeleton is combined with the first skeleton.
  • the spatial mapping relationship between the first skeletons is extended to the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid.
  • the operation of determining the spatial mapping relationship between the second 3D mesh and the first 3D mesh may include the following steps:
  • the following formula (7) can be used to determine the The spatial mapping relationship between the components of mesh vertices on each pair of bones in multiple pairs of bones and the mesh vertices on the first 3D mesh:
  • ⁇ (i) is the angle error between the i-th pair of bones
  • T (i) is the translation error between the i-th pair of bones
  • ⁇ (i) is The nonlinear extension error between the i-th pair of bones
  • W j,i is the weight corresponding to the j-th mesh vertex on the first 3D mesh and the i-th pair of bones
  • both i and j are positive integers.
  • the spatial mapping relationship between the components of the mesh vertices on the second three-dimensional mesh on each pair of bones in the plurality of pairs of bones and the mesh vertices on the first three-dimensional The components of the mesh vertices on multiple pairs of bones are added to obtain the spatial mapping relationship between the mesh vertices on the second three-dimensional mesh and the mesh vertices on the first three-dimensional mesh.
  • the following formula (8) can be used: The components of the mesh vertices on the second three-dimensional mesh on the multiple pairs of bones are added to obtain the difference between the mesh vertices on the second three-dimensional mesh and the mesh vertices on the first three-dimensional mesh. Spatial mapping relationship:
  • a j,i is the j-th mesh vertex on the second three-dimensional mesh among multiple pairs of bones
  • a j,i may be a preset parameter or a learnable parameter, which is not limited in this embodiment of the present application.
  • a second three-dimensional mesh corresponding to the second skeleton may be generated by tuning the model according to the first skeleton, the second skeleton, and the first three-dimensional mesh.
  • the tuning model is used to determine the skeleton error between the second skeleton and the first skeleton, and adjust the first three-dimensional grid according to the skeleton error between the second skeleton and the first skeleton to obtain the corresponding second skeleton. the second three-dimensional grid.
  • the tuning model can be obtained by training according to the sixth sample data in advance.
  • the sixth sample data may include a sample target image and a sample three-dimensional grid corresponding to the sample target image.
  • each of the above steps 101 to 104 may be implemented by a corresponding deep learning model.
  • some or all of the deep learning models corresponding to each step may also be used. It is integrated in one model, which is not limited in this embodiment of the present application.
  • FIG. 2 is a schematic diagram of the logical structure of a 3D mesh reconstruction system provided by an embodiment of the present application.
  • the system includes an initial 3D reconstruction model 21 , a first skeleton model 22 and an tuning model 23 .
  • the process of 3D mesh reconstruction at least one video frame in the video can be obtained first, and then the at least one video frame is used as the input of the initial 3D reconstruction model 21 and the first skeleton model 22 respectively, and the initial 3D reconstruction model 22 is used to output the target's data.
  • the first 3D mesh and the first skeleton of the first 3D mesh output the second skeleton of the target through the first skeleton model 22 .
  • the first three-dimensional mesh, the first skeleton, and the second skeleton are used as inputs to the tuning model 23 , and the tuning model 23 outputs a second mesh corresponding to the second skeleton.
  • a first 3D mesh of the target can be generated according to the global image feature of at least one first target image, and a first skeleton of the first 3D mesh can be extracted. Since the first 3D mesh is generated according to the global image features of the target image, the overall accuracy and robustness are high, and the first skeleton is the skeleton of the first 3D mesh, so the same overall High accuracy and robustness.
  • the second skeleton may be generated according to the local image features of the at least one second target image. Since the second skeleton is generated from the local image features of the target image, the accuracy in local detail and depth is high.
  • the global image features and local image features of the target image can be combined to improve the overall accuracy and robustness.
  • the higher first 3D grid is tuned, so that the second 3D grid obtained after tuning has a certain overall accuracy and robustness, as well as certain local details and depth accuracy. , the overall accuracy and robustness, and the accuracy of local details and depth have achieved a certain balance, further improving the 3D mesh reconstruction results and improving the 3D mesh reconstruction effect.
  • FIG. 3 is a flowchart of a method for reconstructing a 3D model of a human body provided by an embodiment of the present application. As shown in FIG. 3 , the method includes the following steps:
  • Step 301 Collect motion capture (Motion Capture, MoCap) data of the human body from the human body video.
  • the motion capture data of the human body refers to the acquisition of pose data and shape information of the human body in motion through certain technical means.
  • the pose data of the human body may include angles of joints, positions of limbs, widths of limbs, and the like.
  • the collected motion capture data can provide the following model information for the calibration of human poses and three-dimensional grids.
  • one or more algorithms such as a depth map method and a marker point method may be used to collect motion capture data of human body videos.
  • a combination of depth map and marker points is used to collect motion capture data of human video.
  • a plurality of ordinary video cameras and a plurality of depth cameras are arranged in the acquisition space in advance. Then, multiple volunteers with different human characteristics are selected to be collected in the collection space. Before the harvesting process begins, each harvester wears multiple marker balls that serve as markers. These marker balls are fixed on the inside and outside of a number of different important nodes of the collector.
  • the human body characteristics may include characteristics such as gender, height, and body shape. Important nodes can include the knee, hip, ankle, spine, pelvis, head and other nodes of the human body.
  • each collector wears 34 marker balls used as markers, which are fixed on the important nodes of the collector (eg knee, hip, ankle, spine, pelvis, head, etc.) Inside and outside.
  • the motion capture data of the human body corresponding to each video frame in the video can also be obtained.
  • the motion capture data of the human body corresponding to multiple video frames in the video can be collected as training data for model training. For example, 1280 sets of data are collected for model training.
  • Step 302 according to the collected motion capture data of the human body, annotate the three-dimensional grid and pose of the human body to obtain the annotated data.
  • the pose of the human body can be used as the skeleton corresponding to the three-dimensional grid for training the following network models involving the skeleton.
  • the pose and 3D mesh of the human body can be estimated based on a sparsely labeled shape and pose estimation algorithm (Motion and Shape from Sparse Marker, MoSh) to obtain labeled data.
  • This method can estimate the accurate pose and 3D mesh, and the estimated error is in the millimeter level.
  • Step 303 Train an initial 3D reconstruction model according to the video frames in the human body video and the corresponding label data.
  • the initial 3D reconstruction model is used to generate a 3D mesh of the human body according to the video frames in the video.
  • the initial 3D reconstruction model can be trained based on the video frames in the video and the 3D grid in the corresponding annotation data.
  • the initial three-dimensional reconstruction model may include a first feature extraction network and a first fully connected network.
  • the first feature extraction network is used for extracting global image features of the video frame, and the extracted global image features are input to the first fully connected network.
  • the first fully connected network is used to generate a 3D mesh of the human body according to the global image features.
  • the first feature extraction network may be a ResNet model.
  • a linear mapping relationship from the 3D mesh to its skeleton (pose) can also be established, and the initial 3D reconstruction model can also include the mapping relationship.
  • the original 3D reconstruction model is used to generate the 3D mesh of the human body from the video frames in the video, and to extract the skeleton of the 3D mesh from the generated 3D mesh.
  • the initial three-dimensional reconstruction model can be trained according to the video frame in the human body video and the three-dimensional grid and pose of the human body in the corresponding annotation data. It is worth noting that the 3D mesh obtained here only considers the overall characteristics of the image, and does not have enough information on the skeleton structure. Accuracy has been greatly improved.
  • Step 304 Train a human skeleton model according to the video frames in the human body video and the poses in the corresponding annotation data.
  • the human skeleton model is used to generate the three-dimensional skeleton of the human body according to the video frames in the video.
  • the human skeleton model may include a second feature extraction network and a skeleton generation network, the second feature extraction network is used to extract the local image features of the video frame, and the skeleton generation network is used to generate the human skeleton (pose and pose) according to the extracted local image features. ).
  • the skeleton generation network may further include a 2D skeleton generation network and a 3D skeleton generation network, the 2D skeleton generation network is used to generate a 2D skeleton of the human body in each video frame according to the local image features of each video frame.
  • the 3D skeleton generation network is used to generate the 3D skeleton of the human body from the 2D skeleton of the human body in one or more video frames.
  • the 3D skeleton generation network can perform multi-view fusion of the 2D skeleton of the human body in each video frame in the video, and generate the 3D skeleton of the human body in each video frame through complementary information.
  • the human body is a non-rigid structure, and the shape of the human body is different in each video frame.
  • the network itself implements the multi-view fusion problem of non-rigid bodies by learning the invariants of the human body and the differences of each video frame. Nonetheless, since the rigid body problem is a degenerate problem compared to the non-rigid body problem, this network also works for rigid bodies.
  • the human body pose obtained in step 302 can be used as annotated data for end-to-end fully supervised training.
  • Step 305 Train and optimize the model according to the labeled data.
  • the tuning model for generating the 3D network can be trained.
  • the tuning model is used to adjust the 3D mesh generated by the initial 3D reconstruction model according to the skeleton error between the skeleton generated by the initial 3D reconstruction model and the skeleton generated by the human skeleton model to obtain an optimized 3D mesh.
  • the tuning model can be trained according to the labeled data obtained in step 302 .
  • the human skeleton model and the tuning model are trained, at least one video frame in the human body video can be obtained, and according to the at least one video frame, a high-accuracy 3D network of the human body can be generated through these three models. grid.
  • Step 306 Use at least one video frame in the human body video as the input of the trained initial three-dimensional reconstruction model, and output the first three-dimensional mesh and the first skeleton of the human body.
  • Step 307 Use at least one video frame in the human body video as the input of the human skeleton model, and output the second skeleton of the human body.
  • Step 308 Use the first 3D mesh of the human body, the first skeleton, and the second skeleton of the human body as the input of the tuning model, and output the second 3D mesh corresponding to the second skeleton.
  • the method provided by the embodiment of the present application can be applied to various three-dimensional reconstruction scenarios.
  • the central axis plane extraction algorithm can be used to extract the first skeleton of the first 3D mesh of the object, and then the position of the key points in the skeleton of the object can be extracted through the local image features of the image.
  • the position of the key point of generates the second skeleton of the object, and then adjusts the first 3D mesh according to the first skeleton and the second skeleton, and obtains the 3D mesh reconstruction result of the object.
  • I can use the 3D corner detection algorithm to generate the first skeleton of the first 3D mesh of the scene, and then generate the second skeleton of the scene according to the local image features of the video frame in the video, and then according to The first skeleton and the second skeleton adjust the first three-dimensional mesh to obtain a three-dimensional mesh reconstruction result of the scene.
  • FIG. 4 is a structural block diagram of a three-dimensional mesh reconstruction apparatus provided by an embodiment of the present application.
  • the apparatus may be integrated in computer equipment. As shown in FIG. 4 , the apparatus includes:
  • a first generation module 401 configured to generate a first three-dimensional grid of a target according to the global image feature of at least one first target image, where the first target image is an image containing the target;
  • An extraction module 402 configured to extract the skeleton of the first three-dimensional mesh to obtain a first skeleton, and the first skeleton is used to characterize the internal structure of the first three-dimensional mesh;
  • the second generation module 403 is configured to generate a second skeleton according to local image features of at least one second target image, where the second target image is an image including the target, and the second skeleton is used to represent the target the internal structure;
  • the adjustment module 404 is configured to adjust the first 3D mesh according to the skeleton error between the second skeleton and the first skeleton to obtain a second 3D mesh corresponding to the second skeleton.
  • the first generation module 401 is used for:
  • the space occupancy information is used to indicate the probability that each point in the space is occupied by the target
  • a three-dimensional grid of the target is generated, and the generated three-dimensional grid is used as the first three-dimensional grid.
  • the second generation module 403 is used for:
  • a skeleton node heatmap corresponding to each second target image is generated, and the skeleton node heatmap is used to indicate the skeleton of the target.
  • a three-dimensional skeleton of the target is generated, and the generated three-dimensional skeleton is used as the second skeleton.
  • the second generation module 403 is used for:
  • a two-dimensional skeleton corresponding to each second target image is generated.
  • the skeleton error between the second skeleton and the first skeleton includes an angular error and a translation error between each pair of bones in the first skeleton and the second skeleton.
  • nonlinear extension error refers to the error caused by the different lengths of each pair of bones.
  • the device further includes a determining module, the determining module is configured to:
  • a nonlinear extension error between each pair of bones is determined according to the angular error and translation error between each pair of bones.
  • the adjustment module 404 includes:
  • a determining unit configured to determine the spatial mapping relationship between the second 3D mesh corresponding to the second skeleton and the first 3D mesh according to the skeleton error between the second skeleton and the first skeleton ;
  • a transformation unit configured to perform spatial transformation on the first three-dimensional grid according to the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid to obtain the second three-dimensional grid.
  • the skeleton error between the second skeleton and the first skeleton includes an angular error and a translation error between each pair of bones in the first skeleton and the second skeleton. and nonlinear extension errors; determine the elements for:
  • the mesh vertices on the second three-dimensional mesh are at the spatial mapping relationship between the components on each pair of bones in the plurality of pairs of bones and the mesh vertices on the first three-dimensional mesh;
  • the first three-dimensional mesh is The components of the mesh vertices on the 2D and 3D meshes on the multiple pairs of bones are added to obtain the difference between the mesh vertices on the second 3D mesh and the mesh vertices on the first 3D mesh. The spatial mapping relationship between them.
  • the determining unit is used for:
  • ⁇ (i) is the angle error between the i-th pair of bones
  • T (i) is the translation error between the i-th pair of bones
  • ⁇ (i) is the nonlinear extension error between the i-th pair of bones
  • W j,i is the weight of the j-th mesh vertex on the first 3D mesh and the i-th pair of bones
  • i and j are both positive integers.
  • the determining unit is used for:
  • the spatial mapping relationship between the second three-dimensional grid and the first three-dimensional grid includes grid vertices on the second three-dimensional grid and grids on the first three-dimensional grid. Spatial mapping relationship between vertices; the adjustment module 404 is used for:
  • a first 3D mesh of the target can be generated according to the global image feature of at least one first target image, and a first skeleton of the first 3D mesh can be extracted. Since the first 3D mesh is generated according to the global image features of the target image, the overall accuracy and robustness are high, and the first skeleton is the skeleton of the first 3D mesh, so the same overall High accuracy and robustness.
  • the second skeleton can be generated according to the local image features of the at least one second target image. Since the second skeleton is generated based on the local image features of the target image, the accuracy in local detail and depth is high.
  • the global image features and local image features of the target image can be combined to improve the overall accuracy and robustness.
  • the higher first 3D grid is tuned, so that the second 3D grid obtained after tuning has a certain overall accuracy and robustness, as well as certain local details and depth accuracy. , the overall accuracy and robustness, and the accuracy of local details and depth have achieved a certain balance, further improving the 3D mesh reconstruction results and improving the 3D mesh reconstruction effect.
  • FIG. 5 is a structural block diagram of a computer device 500 provided by an embodiment of the present application.
  • the computer device 500 may be an electronic device such as a mobile phone, a tablet computer, a desktop computer, and a server.
  • the computer device 500 can be used to implement the blood vessel centerline extraction method provided in the above embodiments.
  • computer device 500 includes: processor 501 and memory 502 .
  • the processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 501 can be implemented by at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
  • the processor 501 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 501 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 502 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 501 to implement the blood vessel center provided by the method embodiments in this application. Line extraction method.
  • the computer device 500 may also optionally include: a peripheral device interface 503 and at least one peripheral device.
  • the processor 501, the memory 502 and the peripheral device interface 503 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 503 through a bus, a signal line or a circuit board.
  • the peripheral device may include at least one of a display screen 504 , an audio circuit 505 , a communication interface 506 and a power supply 507 .
  • FIG. 5 does not constitute a limitation to the computer device 500, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • a computer-readable storage medium is also provided, and instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor, the above-mentioned method for extracting a blood vessel centerline is implemented.
  • a computer program product which, when executed, is used to implement the above-described blood vessel centerline extraction method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

本申请适用于三维重建技术领域,提供了一种三维网格重建方法、装置、设备及存储介质。方法包括:根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格;提取第一三维网格的骨架,得到第一骨架;根据至少一个第二目标图像的局部图像特征,生成第二骨架;根据第二骨架与第一骨架之间的骨架误差,对第一三维网格进行调整,得到第二骨架对应的第二三维网格。如此,可以结合图像的全局图像特征和局部图像特征,对在整体上的准确性和鲁棒性较高的第一三维网格进行调优,使得调优后得到的第二三维网格即具有整体上的准确性和鲁棒性,也具有局部细节和深度上的准确度,提高了三维网格重建效果。

Description

三维网格重建方法、装置、设备及存储介质 技术领域
本申请属于三维重建技术领域,尤其涉及一种三维网格重建方法、装置、设备及存储介质。
背景技术
三维重建是指对三维物体建立适合计算机表示和处理的三维模型。三维模型是一种物体的多边形表示,通常用计算机或者其它视频设备进行显示。三维模型也称三维网格模型,三维模型表面用三维网格来表示,三维网格及其内部区域为对应的三维模型。因此在对三维模型进行重建的过程中,对三维模型表面的三维网格进行重建是关键。
目前的三维网格重建方法通常分为两种思路:第一种思路是提取图像的局部图像特征,基于图像的局部图像特征生成三维网格。这种思路可以使得生成的三维网格在局部细节和深度上的准确度较高。第二种思路是提取图像的全局图像特征,基于图像的全局图像特征生成三维网格。这种思路可以使得生成的三维网格在整体上的准确性和鲁棒性较高。
但是,上述第一种思路忽略了图像的整体特征,导致生成的三维网格在整体的准确性和鲁棒性不高。上述第二种思路忽略了图像的局部特征,导致生成的三维网格在局部细节和深度上不够准确。因此,这两种思路的三维网格重建效果都较低。
发明内容
本申请实施例提供了一种三维网格重建方法、装置、设备及存储介质,可以解决相关技术中三维网格重建效果较低的问题。
第一方面,本申请实施例提供了一种三维网格重建方法,包括:
根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,所述第一目标图像为包含所述目标的图像;
提取所述第一三维网格的骨架,得到第一骨架,所述第一骨架用于表征所述第一三维网格的内部结构;
根据至少一个第二目标图像的局部图像特征,生成第二骨架,所述第二目标图像为包含所述目标的图像,所述第二骨架用于表征所述目标的内部结构;
根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格。
可选地,所述根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,包括:
根据所述至少一个第一目标图像的全局图像特征,确定所述目标的空间占用信息,所述空间占用信息用于指示空间中每个点被所述目标占用的概率;
根据所述空间占用信息,生成所述目标的三维网格,将生成的三维网格作为所述第一三维网格。
可选地,所述根据至少一个第二目标图像的局部图像特征,生成所述目标的第二骨架,包括:
根据所述至少一个第二目标图像中每个第二目标图像的局部图像特征,生成每个第二目标图像对应的骨架节点热图,所述骨架节点热图用于指示所述目标的骨架中每个节点在对应第二目标热图中不同位置出现的概率;
根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架;
根据所述至少一个第二目标图像对应的二维骨架,生成所述目标的三维骨架,将生成的三维骨架作为所述第二骨架。
可选地,所述根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架,包括:
根据每个第二目标图像对应的骨架节点热图,确定每个第二目标图像对应的骨架节点;
根据每个第二目标图像对应的骨架节点以及所述目标的预设拓扑结构,生成每个第二目标图像对应的二维骨架。
可选地,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,所述非线性延展误差是指每对骨骼的长度不同引起的误差。
可选地,所述根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格之前,还包括:
对于所述第一骨架和所述第二骨架中对应的多对骨骼中的每对骨骼,确定所述每对骨骼之间的角度误差;
根据所述每对骨骼之间的角度误差,确定所述每对骨骼之间的平移误差;
根据所述每对骨骼之间的角度误差和平移误差,确定所述每对骨骼之间的非线性延展误差。
可选地,所述根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格,包括:
根据所述第二骨架与所述第一骨架之间的骨架误差,确定所述第二骨架对应的第二三维网格与所述第一三维网格之间的空间映射关系;
根据所述第二三维网格与所述第一三维网格之间的空间映射关系,对所述第一三维网格进行空间变换,得到所述第二三维网格。
可选地,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差;
所述根据所述第二骨架与所述第一骨架之间的骨架误差,确定所述第二骨架对应的第二三维网格与所述第一三维网格之间的空间映射关系,包括:
根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系;
根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系。
可选地,所述根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,包括:
根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,通过以下公式,确定所述第二三维网格 上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000001
其中,
Figure PCTCN2021137703-appb-000002
为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,
Figure PCTCN2021137703-appb-000003
为所述第一三维网格上的第j个网格顶点,Ψ (i)为所述第i对骨骼之间的角度误差,T (i)为所述第i对骨骼之间的平移误差,Δ (i)为所述第i对骨骼之间的非线性延展误差,W j,i为所述第一三维网格上的第j个网格顶点和所述第i对骨骼对应的权重,i和j均为正整数。
可选地,所述根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系,包括:
根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,通过以下公式,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000004
其中,
Figure PCTCN2021137703-appb-000005
为所述第二三维网格上的第j个网格顶点,
Figure PCTCN2021137703-appb-000006
为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,所述A j,i为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量对应的权重。
可选地,所述第二三维网格与所述第一三维网格之间的空间映射关系包括所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系;
所述根据所述第二三维网格与所述第一三维网格之间的空间映射关系,对所述第一三维网格进行空间变换,得到所述第二三维网格,包括:
根据所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第一三维网格的网格顶点的空间位置进行变换,得到所述第二三维网格。
第二方面,提供了一种三维网格重建装置,该装置包括:
第一生成模块,用于根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,所述第一目标图像为包含所述目标的图像;
提取模块,用于提取所述第一三维网格的骨架,得到第一骨架,所述第一骨架用于表征所述第一三维网格的内部结构;
第二生成模块,用于根据至少一个第二目标图像的局部图像特征,生成第二骨架,所述第二目标图像为包含所述目标的图像,所述第二骨架用于表征所述目标的内部结构;
调整模块,用于根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格。
可选地,第一生成模块用于:
根据所述至少一个第一目标图像的全局图像特征,确定所述目标的空间占用信息,所述空间占用信息用于指示空间中每个点被所述目标占用的概率;
根据所述空间占用信息,生成所述目标的三维网格,将生成的三维网格作为所述第一三维网格。
可选地,第二生成模块用于:
根据所述至少一个第二目标图像中每个第二目标图像的局部图像特征,生成每个第二目标图像对应的骨架节点热图,所述骨架节点热图用于指示所述目标的骨架中每个节点在对应第二目标热图中不同位置出现的概率;
根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架;
根据所述至少一个第二目标图像对应的二维骨架,生成所述目标的三维骨架,将生成的三维骨架作为所述第二骨架。
可选地,第二生成模块用于用于:
根据每个第二目标图像对应的骨架节点热图,确定每个第二目标图像对应的骨架节点;
根据每个第二目标图像对应的骨架节点以及所述目标的预设拓扑结构,生成每个第二目标图像对应的二维骨架。
可选地,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,所述非线性延展误差是指每对骨骼的长度不同引起的误差。
可选地,所述装置还包括确定模块,确定模块用于:
对于所述第一骨架和所述第二骨架中对应的多对骨骼中的每对骨骼,确定所述每对骨骼之间的角度误差;
根据所述每对骨骼之间的角度误差,确定所述每对骨骼之间的平移误差;
根据所述每对骨骼之间的角度误差和平移误差,确定所述每对骨骼之间的非线性延展误差。
可选地,所述调整模块包括:
确定单元,用于根据所述第二骨架与所述第一骨架之间的骨架误差,确定所述第二骨架对应的第二三维网格与所述第一三维网格之间的空间映射关系;
变换单元,用于根据所述第二三维网格与所述第一三维网格之间的空间映射关系,对所述第一三维网格进行空间变换,得到所述第二三维网格。
可选地,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差;确定单元用于:
根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系;
根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系。
可选地,所述确定单元用于:
根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,通过以下公式,确定所述第二三维网格 上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000007
其中,
Figure PCTCN2021137703-appb-000008
为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,
Figure PCTCN2021137703-appb-000009
为所述第一三维网格上的第j个网格顶点,Ψ (i)为所述第i对骨骼之间的角度误差,T (i)为所述第i对骨骼之间的平移误差,Δ (i)为所述第i对骨骼之间的非线性延展误差,W j,i为所述第一三维网格上的第j个网格顶点和所述第i对骨骼对应的权重,i和j均为正整数。
可选地,所述确定单元用于:
根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,通过以下公式,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000010
其中,
Figure PCTCN2021137703-appb-000011
为所述第二三维网格上的第j个网格顶点,
Figure PCTCN2021137703-appb-000012
为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,所述A j,i为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量对应的权重。
可选地,所述第二三维网格与所述第一三维网格之间的空间映射关系包括所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系;所述调整模块用于:
根据所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第一三维网格的网格顶点的空间位置进行变换,得到所述第二三维网格。
第三方面,本申请实施例提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执 行所述计算机程序时实现上述第一方面中任一项所述的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面中任一项所述的方法。
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机设备上运行时,使得计算机设备执行上述第一方面中任一项所述的方法。
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。
本申请实施例与现有技术相比存在的有益效果是:
本申请实施例中,一方面可以根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,以及提取第一三维网格的第一骨架,所生成的第一三维网格和第一骨架在整体上的准确性和鲁棒性较高。另一方面可以根据至少一个第二目标图像的局部图像特征,生成第二骨架,所生成的第二骨架在局部细节和深度上的准确度较高。之后,通过根据第二骨架与第一骨架之间的骨架误差,对第一三维网格进行调整,可以结合目标图像的全局图像特征和局部图像特征,对在整体上的准确性和鲁棒性较高的第一三维网格进行调优,使得调优后的得到的第二三维网格即具有一定的整体上的准确性和鲁棒性,也具有一定的局部细节和深度上的准确度,在整体上的准确性和鲁棒性、以及局部细节和深度上的准确度之间取得了一定的平衡,进一步完善了三维网格重建结果,提高了三维网格重建效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种三维网格重建方法的流程图;
图2是本申请实施例提供的一种三维网格重建系统的逻辑结构示意图;
图3是本申请实施例提供的一种人体三维模型重建方法的流程图;
图4是本申请实施例提供的一种三维网格重建装置的结构框图;
图5是本申请实施例提供的一种计算机设备的结构框图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。另外,在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
为了便于理解,首先对本申请实施例涉及的应用场景进行说明。
本申请实施例提供的三维网格重建方法应用于对目标的三维网格进行重建。其中,目标为待重建的对象,目标可以预习设置或人为指定。比如,目标可以包括生物、非生物或场景等。示例地,生物可以为人体或动物等,或者为生物中的组织或器官等;非生物可以为车辆、障碍物等。另外,目标可以是显示世界中的实体,也可以为虚构的物体。本申请实施例对待重建目标的具体类型不做限定。
另外,本申请实施例提供的三维网格重建方法还可以应用于任何需要更高精度的三维网格的应用领域,比如虚拟现实游戏、智能城市、导航或自动驾驶等领域。
另外,本申请实施例提供的三维网格重建方法可以基于至少一个目标图像 来对目标的三维网格进行重建。比如,至少一个目标图像可以为视频中的至少一个视频帧,本申请实施例可以基于视频来对目标的三维网格进行重建。
接下来,对本申请实施例涉及的实施环境进行说明。
本申请实施例提供的三维网格重建方法可以应用于计算机设备中。比如,该计算机设备安装有图像处理软件,该图像处理软件可以实现本申请实施例提供的三维网格重建方法。比如,该图像处理软件可以对视频中的至少一个视频帧进行处理,得到准确度较高的三维网格。其中,计算机设备可以为终端设备或服务器,终端设备可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,本申请实施例对该计算机设备的具体类型不作任何限制。
接下来,对本申请实施例提供的三维网格重建方法进行详细说明。
图1是本申请实施例提供的一种三维网格重建方法的流程图,该方法应用于计算机设备中,如图1所示,该方法包括如下步骤:
步骤101:根据至少一个第一目标图像的全局图像特征,生成目标的第一三维模型,第一目标图像为包含目标的图像。
其中,目标为待重建的对象。其中,至少一个第一目标图像包括一个或多个第一目标图像。这至少一个第一目标图像可以从本地存储的数据中获取得到,可以为其他设备发送得到,也可以从网络中下载得到,本申请实施例对这至少一个第一目标图像的获取方式不做限定。
另外,这至少一个第一目标图像还可以为第一视频中的至少一个视频帧,比如,可以从第一视频中获取至少一个视频帧,将这至少一个视频帧作为至少一个第一目标图像。这至少一个视频帧可以为连续的视频帧,也可以为非连续的视频帧。当然,这至少一个第一目标图像也可以为其他类型的包含目标的图像,本申请实施例对此不做限定。
其中,全局图像特征也称整体特征。第一目标图像的全局图像特征是指能表征第一目标图像的整幅图像上的特征,用于描述第一目标图像或第一目标图像中目标的整体特性。比如,全局图像特征可以包括颜色、形状、纹理和结构 等特征中的一种或多种。
本申请实施例中,可以先获取至少一个第一目标图像,对这至少一个第一目标图像中的每个第一目标图像进行特征提取,得到每个第一目标图像的全局图像特征。然后,根据至少一个第一目标图像的全局图像特征,对目标的三维网格进行重建,得到第一三维网格。
作为一个示例,可以通过第一特征提取模型,对第一目标图像进行特征提取,得到第一目标图像的全局图像特征。其中,第一特征提取模型用于提取图像的全局图像特征。第一特征提取模型可以为神经网络模型,比如CNN(Convolutional Neural Networks,卷积神经网络)模型或ResNet(残差卷积神经网络)模型等。
作为一个示例,根据至少一个第一目标图像的全局图像特征,生成目标的第一三维模型的操作可以包括如下步骤:
1)根据至少一个第一目标图像的全局图像特征,确定目标的空间占用信息。
其中,该空间占用信息用于指示空间中每个点被目标占用的概率。空间中每个点被目标占用的概率在0-1之间。比如,空间中的某个点被目标的占用的概率可以为0、0.3、0.5、0.8或1等。
作为一个示例,可以通过分类网络模型,对至少一个第一目标图像的全局图像特征进行处理,得到目标的空间占用信息。其中,分类网络模型用于根据至少一个图像的全局图像特征生成待重建物体的空间占用信息。该分类网络模型可以为神经网络模型,比如CNN模型或密集网络模型等。
另外,在确定目标的空间占用信息的过程中,还可以通过密集采样来实现空间有效范围内所有点的占用概率的估计。
2)根据目标的空间占用信息,生成目标的三维网格,将生成的三维网格作为第一三维网格。
根据目标的空间占用信息,可以确定空间中哪些点被目标占用,哪些点未被目标占用。因此,根据目标的空间占用信息可以确定出空间中被目标占用的点,进而根据空间中被目标占用的点生成目标的三维模型表面的三维网格(第一三维网格)。
作为一个示例,可以根据目标的空间占用信息确定空间中被目标占用的概率为指定概率的点,然后根据空间中被目标占用的概率为指定概率的点,生成 第一三维网格。其中,指定概率可以预先设置,比如指定概率为0.5。
作为另一个示例,可以根据目标的空间占用信息确定空间中被目标占用的点,根据空间中被目标占用的点生成目标的第一三维模型,将第一三维模型表面的三维网格确定为第一三维网格。
作为一个示例,可以根据目标的空间占用信息,通过网格化算法生成第一三维网格。其中,该网格化算法可以为立方体行军算法等。
比如,可以通过网格化算法,确定目标的空间占用信息与第一三维网格中各个顶点的映射关系,然后根据目标的空间占用信息与第一三维网格中各个顶点的映射关系,生成第一三维网格。
另外,还可以根据至少一个第一目标图像的全局图像特征,确定目标的颜色空间信息,颜色空间信息用于指示目标表面的颜色。然后根据目标的颜色空间信息,生成第一三维网格表面的颜色。比如,可以根据目标的颜色空间信息,通过网格化算法生成第一三维网格表面的颜色。其中,颜色空间可以为RGB颜色空间。
也即是,网格化算法不仅局限于位置的占用,对于颜色空间也可以做同样的推广,因此可以通过估计空间占用信息相似的算法来估计颜色空间信息,再根据颜色空间信息采用网格化算法生成第一三维网格表面的颜色。
在一个实施例中,可以根据至少一个第一目标图像,通过初始三维重建模型生成目标的第一三维网格。其中,初始三维重建模型包括第一特征提取网络和第一全连接网络,第一特征提取网络用于提取图像的全局图像特征,第一全连接网络用于根据提取的全局图像特征生成三维网格。
比如,可以将至少一个第一目标图像作为初始三维重建模型的输入,通过第一特征提取网络提取网络提取至少一个第一目标图像的全局图像特征,将提取的全局图像特征作为第一全连接网络的输入。通过第一全连接网络,根据至少一个第一目标图像的全局图像特征生成第一三维网格。
其中,初始三维重建模型可以预先根据第一样本数据训练得到。其中,第一样本数据可以包括样本目标图像以及样本目标图像对应的三维网格。
步骤102:提取第一三维网格的骨架,得到第一骨架,第一骨架用于表征第一三维网格的内部结构。
其中,第一骨架在第一三维网格对应的第一三维模型内部生成的、可以表 征整个第一三维模型的结构信息的树状结构。
作为一个示例,提取第一三维网格的骨架可以包括以下几种实现方式:
第一种实现方式,根据第一三维网格上的网格顶点,以及三维网格上的网格顶点与三维网格的骨架之间的映射关系,来确定第一三维网格的骨架,将确定的骨架作为第一骨架。
也即是,可以根据三维模型表面的三维网格的网格顶点与三维模型的骨架之间的映射关系,确定第一三维网格上的网格顶点对应的骨架,将确定的骨架作为第一骨架。
其中,该映射关系可以预先获取得到,也可以根据第二样本数据进行学习得到。其中,第二样本数据可以包括多个样本三维网格以及每个样本三维网格对应的样本骨架。每个样本三维网格对应的样本骨架可以由人工标注得到。
第二种实现方式,通过骨架提取模型提取第一三维网格的骨架,得到第一骨架。
比如,可以将第一三维网格数据作为该骨架提取模型的输入,通过该骨架提取模型确定第一骨架。
其中,骨架提取模型用于提取三维网格的骨架。该骨架提取模型可以根据第三样本数据进行训练得到。第三样本数据可以包括多个样本三维网格以及每个样本三维网格对应的样本骨架。每个样本三维网格对应的样本骨架可以由人工标注得到。
第三种实现方式,采用骨架估计算法,确定第一三维网格的骨架,得到第一骨架。其中,骨架估计算法用于确定三维网格的骨架,具体可以为中轴面提取算法等。
比如,可以根据预设拓扑结构,采用骨架估计算法,确定第一三维网格的骨架,得到第一骨架。其中,预设拓扑结构可以预先设置,可以根据待重建的目标的拓扑结构进行设置,目标的拓扑结构可以根据目标的结构特性确定得到。
第三种实现方式中直接采用骨架估计算法来确定第一三维网格的骨架,算法较为简单,骨架提取效率较高。
步骤103:根据至少一个第二目标图像的局部图像特征,生成第二骨架,第二目标图像为包含目标的图像,第二骨架用于表征目标的内部结构。
其中,至少一个第二目标图像包括一个或多个第二目标图像。这至少一个 第二目标图像可以从本地存储的数据中获取得到,可以为其他设备发送得到,也可以从网络中下载得到,本申请实施例对这至少一个第二目标图像的获取方式不做限定。
另外,这至少一个第二目标图像还可以为第二视频中的至少一个视频帧,比如,可以从第二视频中获取至少一个视频帧,将这至少一个视频帧作为至少一个第二目标图像。这至少一个视频帧可以为连续的视频帧,也可以为非连续的视频帧。当然,这至少一个第二目标图像也可以为其他类型的包含目标的图像,本申请实施例对此不做限定。
需要说明的是,本申请实施例所述的至少一个第二目标图像与上述至少一个第一目标图像可以为相同,也可以为不同,本申请实施例对此不做限定。另外,上述第一视频与第二视频可以为相同视频,也可以为不同视频,本申请实施例对此也不做限定。
其中,局部图像特征也称局部特征。第二目标图像的局部图像特征是第二目标图像的图像特征的局部表达,用于描述第二目标图像具有的局部特性。与全局图像特征相比,局部图像特征具有在图像中蕴含数量丰富、特征间相关度小、遮挡情况下不会因为部分特征的消失而影响其他特征的检测和匹配等特点。
本申请实施例中,可以先获取至少一个第二目标图像的局部图像特征,对这至少一个第二目标图像中的每个第二目标图像进行特征提取,得到每个第二目标图像的局部图像特征。然后,根据至少一个第二目标图像的局部图像特征,生成第二骨架。
作为一个示例,可以通过第二特征提取模型,对第二目标图像进行特征提取,得到第二目标图像的局部图像特征。其中,第二特征提取模型用于提取图像的局部图像特征。第二特征提取模型可以为神经网络模型,比如CNN模型或HRNet(高分辨率神经网络)模型等。
在一种可能的实现方式中,本申请实施例中,还可以根据至少一个第二目标图像,通过第一骨架模型生成目标的三维骨架。其中,第一骨架模型可以包括第二特征提取网络和骨架生成网络,第二特征提取网络用于提取图像的局部图像特征,骨架生成网络用于根据提取的局部图像特征生成骨架。
比如,可以将至少一个第二目标图像作为第一骨架模型的输入,通过第二特征提取网络提取至少一个第二目标图像的局部图像特征,将提取的局部图像 特征作为骨架生成网络的输入。通过骨架生成网络,根据至少一个第二目标图像的局部图像特征生成目标的第二骨架。
作为一个示例,骨架生成网络还可以包括二维骨架生成网络和三维骨架生成网络,二维骨架生成网络用于根据至少一个第二目标图像的局部图像特征生成目标的至少一个二维骨架,将至少一个二维骨架作为三维骨架生成网络的输入。三维骨架生成网络用于根据至少一个二维骨架生成目标的三维骨架。
其中,第一骨架模型可以预先根据第四样本数据训练得到。第四样本数据可以包括样本目标图像以及样本目标图像对应的样本骨架。
作为一个示例,根据至少一个第二目标图像的局部图像特征,生成第二骨架的操作包括如下步骤:
1)根据至少一个第二目标图像中每个第二目标图像的局部图像特征,生成每个第二目标图像对应的骨架节点热图。
其中,每个第二目标图像对应的骨架节点热图用于指示目标的骨架中每个节点在每个第二目标热图中不同位置出现的概率。骨架中的节点是指骨架树状结构的节点,包括骨架树状结构的根节点、子节点和叶子节点。
其中,骨架节点热图包括骨架中多个节点的热图,每个节点的热图用于指示每个节点在对应第二目标热图中不同位置出现的概率,某个位置的概率越大,表示该节点在该位置出现的可能性越高。
需要说明的是,骨架节点热图可以用于表征第二骨架。通过骨架节点热图来表征骨架具有以下优点:1)骨架节点热图是连续的函数,可以很好地适配深度学习网络模型的学习,使得生成第二骨架的鲁棒性和稳定性更高。2)骨架节点热图可以很好地标记骨架节点的位置。3)骨架节点热图不仅可以表征骨架节点的位置,还可以表征骨架节点估计的误差,便于生成较为准确的骨架。
2)根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架。
可以先根据每个第二目标图像对应的骨架节点热图,确定每个第二目标图像对应的骨架节点。然后,根据每个第二目标图像对应的骨架节点以及预设拓扑结构,生成每个第二目标图像对应的二维骨架。
其中,预设拓扑结构可以预先设置,可以根据待重建的目标的拓扑结构进行设置,目标的拓扑结构可以根据目标的结构特性确定得到。
作为一个示例,可以根据每个第二目标图像对应的骨架节点热图,可以确定骨架节点在每个第二目标图像中出现概率最大的位置,将骨架节点出现概率最大的位置确定为骨架节点的位置。然后,根据预设拓扑结构,对每个第二目标图像对应的骨架节点的位置进行连接,以生成每个第二目标图像对应的二维骨架。
3)根据至少一个第二目标图像对应的二维骨架,生成目标的三维骨架,将生成的三维骨架作为第二骨架。
其中,三维骨架包括骨架的二维位置信息和深度信息。也即是,目标的三维骨架包括不同深度的二维骨架。
作为一个示例,可以根据一个第二目标图像对应的二维骨架以及目标的先验信息,生成目标的三维骨架。其中,目标的先验信息包括目标的深度信息,当然,还可以包括其他信息。示例地,目标的先验信息可以预先根据目标的多个样本三维骨架的深度信息进行学习得到。示例地,目标的深度信息可以包括目标在不同角度的空间几何测量信息。
作为另一个示例,可以将多个第二目标图像对应的二维骨架进行融合处理,以获得包括丰富的二维位置信息和深度信息的三维骨架。
在一种可能的实现方式中,可以根据至少一个第二目标图像对应的二维骨架,通过第二骨架模型,生成目标的三维骨架。其中,第二骨架模型用于根据至少一个二维骨架生成三维骨架。该第二骨架模型可以预先根据第五样本数据进行训练得到,第五样本数据可以包括至少一个样本二维骨架以及对应的样本三维骨架。示例地,第二骨架模型中的网络可以为上述三维骨架生成网络。
另外,本申请实施例中待重建的目标可以为刚体,也可以为非刚体。若待重建的目标为非刚体,还可以根据多个第二目标图像对应的二维骨架、多个第二目标图像的连续信息、以及目标的先验信息,来生成非刚体的目标得到三维骨架以及生成的三维骨架在每个第二目标图像中的变化信息。比如,多个第二目标图像可以为连续的多个视频帧。
另外,在根据至少一个第二目标图像的局部图像特征,生成目标的第二骨架之后,还可以确定第二骨架与第一骨架之间的骨架误差,以便根据第二骨架与第一骨架之间的骨架误差,以及第一三维网格,来生成目标的第二三维网格。
其中,第二骨架与第一骨架之间的骨架误差可以包括角度误差、平移误差 和非线性延展误差中的一种或多种,非线性延展误差是指骨架中的对应骨骼长度不同引起的误差。
其中,骨架中的骨骼是指骨架中相互连接的两个节点之间的连接部分。比如,骨骼包括父节点和子节点,是骨骼中的父节点和子节点之间连接的部分。第一骨架和第二骨架中的对应骨骼是指第一骨架和第二骨架中拓扑结构相同的一对骨骼。
作为一个示例,第二骨架与第一骨架之间的骨架误差可以包括第一骨架和第二骨架中对应骨骼之间的角度误差、平移误差和非线性延展误差,比如,包括第一骨架和第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差。
作为一个示例,确定第二骨架与第一骨架之间的骨架误差的操作可以包括如下步骤:
1)对于第一骨架和第二骨架中对应的多对骨骼中的每对骨骼,确定每对骨骼之间的角度误差。
其中,每对骨骼之间角度误差可以用旋转矩阵表示。也即是,根据该旋转矩阵对第一骨骼进行旋转,可以使得旋转后的第一骨骼与第二骨骼之间的角度误差较小。其中,第一骨骼为第一骨架中的任一骨骼,第二骨骼为第二骨架中与第一骨骼对应的骨骼。
作为一个示例,对于第一骨架和第二骨架中对应的多对骨骼中的每对骨骼,可以先确定每对骨骼之间的旋转角,然后根据每对骨骼之间的旋转角确定每对骨骼之间的旋转矩阵。比如,可以根据每对骨骼之间的旋转角,通过罗德里格斯公式,确定每对骨骼之间的旋转矩阵。
比如,可以先通过以下公式1)确定每对骨骼之间的旋转角:
Figure PCTCN2021137703-appb-000013
其中,ψ是指第一骨骼与第二骨骼之间的旋转角,b ori是指第一骨骼的方向向量,b target是指第二骨骼的方向向量。
其中,
Figure PCTCN2021137703-appb-000014
是指第一骨骼中的父节点的位置,
Figure PCTCN2021137703-appb-000015
是指第一骨骼的子节点的位置,b ori是第一骨骼中的父节点与子节点的位置差。
Figure PCTCN2021137703-appb-000016
是指第二骨骼中的父节点的位置,
Figure PCTCN2021137703-appb-000017
是指第二骨骼的子节点的位置,b target是第二骨骼中的父节点与子节点的位置差。
然后,可以根据每对骨骼之间的旋转角,通过以下公式2)确定每对骨骼之间的旋转矩阵:
Ψ=cos|ψ|I+(1-cos|ψ|)φφ T+sin|ψ|φ    (2)其中,Ψ是指第一骨骼与第二骨骼之间的旋转矩阵;ψ是指第一骨骼与第二骨骼之间的旋转角;
Figure PCTCN2021137703-appb-000018
用于表征第一骨骼与第二骨骼之间的方向向量,φ T是φ的转置;
Figure PCTCN2021137703-appb-000019
是φ对应的反对称矩阵。
2)根据每对骨骼之间的角度误差,确定每对骨骼之间的平移误差。
其中,每对骨骼之间的平移误差可以用平移向量表示。也即是,根据该平移向量对第一骨骼进行平移,可以使得平移后的第一骨骼与第二骨骼之间的位置误差较小。
作为一个示例,可以根据每对骨骼之间的角度误差,通过以下公式3),确定每对骨骼之间的平移误差:
Figure PCTCN2021137703-appb-000020
其中,T是指第一骨骼与第二骨骼之间平移误差,Ψ是指第一骨骼与第二骨骼之间的旋转矩阵,
Figure PCTCN2021137703-appb-000021
是指第二骨骼中的父节点的位置,
Figure PCTCN2021137703-appb-000022
是指第一骨骼中的父节点的位置。
3)根据每对骨骼之间的角度误差和平移误差,确定每对骨骼之间的非线性延展误差。
其中,每对骨骼之间的非线性延展误差可以用非线性变换来表示。也即是,对第一骨骼进行非线性变换,可以使得变换后的第一骨骼与第二骨骼之间位置误差较小。
每对骨骼之间的骨骼长度不同将会引起非线性延展误差。比如,在根据确定的旋转矩阵对第一骨骼进行旋转,以及根据确定的平移向量对旋转后的第一 骨骼进行平移之后,由于第一骨骼与第二骨骼之间的骨骼长度不同,因此平移后的第一骨骼与第二骨骼之间的位置还将存在一定的误差,本申请实施例中,将骨骼长度不同引起的误差称之为非线性延展误差。为了弥补非线性延展误差,本申请实施例中设计了一种非线性变换对其进行弥补。
作为一个示例,可以先根据上述确定的旋转矩阵和平移向量对第一骨骼进行刚体变换,再根据刚体变换后的第一骨骼与第二骨骼之间的位置误差,确定第一骨骼与第二骨骼之间的非线性延展误差。
比如,可以根据上述确定的旋转矩阵和平移向量对第一骨骼中的子节点的位置进行刚体变换,得到变换后的位置,再确定变换后的位置与第二骨骼中的字节点的位置之间的位置误差,将该位置误差确定为第一骨骼与第二骨骼之间的非线性延展误差。
比如,可以先根据上述确定的旋转矩阵和平移向量,通过以下公式(4)对第一骨骼中的子节点的位置进行刚体变换,得到变换后的位置:
Figure PCTCN2021137703-appb-000023
其中,
Figure PCTCN2021137703-appb-000024
是指第一骨骼中的子节点的位置,Ψ是指第一骨骼与第二骨骼之间的旋转矩阵,T是指第一骨骼与第二骨骼之间平移误差,
Figure PCTCN2021137703-appb-000025
是指根据旋转矩阵和平移误差对
Figure PCTCN2021137703-appb-000026
进行刚体变换后得到的位置,这个位置与第二骨骼中的子节点的位置可能因对应骨骼长度不同还存在一定的差异。
然后,根据变换后的位置与第二骨骼中的字节点的位置,通过以下公式5),确定第一骨骼与第二骨骼之间的非线性延展误差:
Figure PCTCN2021137703-appb-000027
其中,Δ是指第一骨骼与第二骨骼之间的非线性延展误差,
Figure PCTCN2021137703-appb-000028
是指第二骨骼中的子节点的位置,
Figure PCTCN2021137703-appb-000029
是指根据旋转矩阵和平移误差对
Figure PCTCN2021137703-appb-000030
进行刚体变换后得到的位置。
需要说明的是,根据Δ可以建立对应骨骼中子节点之间的映射关系,将子节点之间的映射关系扩展至骨骼上的任意点,即可得到对应骨骼中任意点之间的映射关系。
另外,根据第一骨架和第二骨架之间骨架误差,还可以确定第一骨架与第二骨架之间的空间映射关系。之后,可以根据第一骨架与第二骨架之间的空间 映射关系,对第一骨架进行空间变换,然后确定变换后的第一骨架和第二骨架之间的空间误差,以根据确定的空间误差对第一骨架与第二骨架之间的空间映射关系进行验证。
作为一个示例,可以根据第一骨架和第二骨架之间骨架误差,通过以下公式(6)确定第一骨架与第二骨架之间的空间映射关系:
Figure PCTCN2021137703-appb-000031
其中,
Figure PCTCN2021137703-appb-000032
为第一骨骼中b点的位置,
Figure PCTCN2021137703-appb-000033
为第二骨骼中与b点对应的点的位置,Ψ是指第一骨骼与第二骨骼之间的旋转矩阵,T是指第一骨骼与第二骨骼之间平移误差,Δ是指第一骨骼与第二骨骼之间的非线性延展误差,W为权重参数。其中,W可以预先设置得到,也可以通过学习得到。
作为一个示例,在确定变换后的第一骨架和第二骨架之间的空间误差之后,可以判断空间误差是否小于误差阈值,若小于,则确定验证通过,若不小于,则确定验证不通过。
在一个实施例中,若验证通过,则执行下述步骤104。若验证未通过,则对第一骨架与第二骨架之间的空间映射关系进行调整,以使变换后的第一骨架和第二骨架之间的空间误差小于误差阈值。再根据调整后的空间映射关系,确定调整后的第一骨架和第二骨架之间骨架误差,以便根据调整后的第一骨架和第二骨架之间骨架误差,以及第一三维模型,生成第二骨架对应的第二三维模型。
步骤104:根据第二骨架与第一骨架之间的骨架误差,对第一三维网格进行调整,得到第二骨架对应的第二三维模型。
其中,第一三维网格是根据目标图像的全局图像特征生成的,在整体上的准确性和鲁棒性较高,而第一骨架是第一三维模型的骨架,因此同样在整体上的准确性和鲁棒性较高。第二骨架是根据目标图像的局部图像特征生成的,在局部细节和深度上的准确度较高。
本申请实施例中,通过根据第二骨架与第一骨架之间的骨架误差,对第一三维网格进行调整,可以结合目标图像的全局图像特征和局部图像特征,对在整体上的准确性和鲁棒性较高的第一三维模型进行调优,使得调优后的得到的第二三维网格既具有一定的整体上的准确性和鲁棒性,也具有一定的局部细节 和深度上的准确度,在整体上的准确性和鲁棒性、以及局部细节和深度上的准确度取得了一定的平衡,进一步完善了三维模型重建结果,提高了三维模型重建效果。
作为一个示例,可以先根据第二骨架与第一骨架之间的骨架误差,确定第二三维网格与第一三维网格之间的空间映射关系。然后,根据第二三维网格与第一三维网格之间的空间映射关系,对第一三维模型进行空间变换,得到第二三维模型。
在一个实施例中,第二三维网格与第一三维网格之间的空间映射关系可以包括第二三维网格上的网格顶点与第一三维网格上的网格顶点之间的空间映射关系。相应地,根据第二三维网格与第一三维网格之间的空间映射关系,对第一三维网格进行空间变换的操作可以包括:根据第二三维网格上的网格顶点与第一三维网格上的网格顶点之间的空间映射关系,对第一三维网格的网格顶点的空间位置进行变换,得到第二三维网格。
在一个实施例中,可以根据第二骨架与第一骨架之间的骨架误差,确定第二骨架与第一骨架之间的空间映射关系,然后基于线性混合蒙皮的扩展,将第二骨架与第一骨架之间的空间映射关系扩展为第二三维网格与第一三维网格之间的空间映射关系。
作为一个示例,根据第二骨架与第一骨架之间的骨架误差,确定第二三维网格与第一三维网格之间的空间映射关系的操作可以包括如下步骤:
1)根据第一骨架与第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定第二三维网格上的网格顶点在多对骨骼中每对骨骼上的分量与第一三维网格上的网格顶点之间的空间映射关系。
比如,可以根据第一骨架与第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,通过以下公式(7),确定第二三维网格上的网格顶点在多对骨骼中每对骨骼上的分量与第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000034
其中,
Figure PCTCN2021137703-appb-000035
为第二三维网格上的第j个网格顶点在多对骨骼中第i对骨骼上的分量,
Figure PCTCN2021137703-appb-000036
为第一三维网格上的第j个网格顶点,Ψ (i)为第i对骨骼之间 的角度误差,T (i)为第i对骨骼之间的平移误差,Δ (i)为第i对骨骼之间的非线性延展误差,W j,i为第一三维网格上的第j个网格顶点和第i对骨骼对应的权重,i和j均为正整数。
2)根据第二三维网格上的网格顶点在多对骨骼中每对骨骼上的分量与第一三维网格上的网格顶点之间的空间映射关系,对第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到第二三维网格上的网格顶点与第一三维网格上的网格顶点之间的空间映射关系。
例如,可以根据第二三维网格上的网格顶点在多对骨骼中每对骨骼上的分量与第一三维网格上的网格顶点之间的空间映射关系,对第二三维网格上的网格顶点在多对骨骼上的分量进行相加,得到第二三维网格上的网格顶点与第一三维网格上的网格顶点之间的空间映射关系。
或者,也可以根据第二三维网格中的每个顶点在多对骨骼中每对骨骼上的分量与第一三维网格中的每个顶点之间的空间映射关系,以及第二三维网格中的每个顶点在多对骨骼中每对骨骼上的分量对应的权重,对第二三维网格中的每个顶点在多对骨骼上的分量进行加权求和,得到第二三维网格中的每个顶点与第一三维网格中的每个顶点之间的空间映射关系。
比如,可以根据第二三维网格上的网格顶点在多对骨骼中每对骨骼上的分量与第一三维网格上的网格顶点之间的空间映射关系,通过以下公式(8),对第二三维网格上的网格顶点在多对骨骼上的分量进行加和处理,得到第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000037
其中,
Figure PCTCN2021137703-appb-000038
为第二三维网格上的第j个网格顶点,
Figure PCTCN2021137703-appb-000039
为第二三维网格上的第j个网格顶点在多对骨骼中第i对骨骼上的分量,A j,i为第二三维网格上的第j个网格顶点在多对骨骼中第i对骨骼上的分量对应的权重。A j,i可以为预设参数,也可以为可学习参数,本申请实施例对此不做限定。
作为一个实施例,可以根据第一骨架、第二骨架以及第一三维网格,通过调优模型,生成第二骨架对应的第二三维网格。其中,该调优模型用于确定第二骨架与第一骨架之间的骨架误差,根据第二骨架与第一骨架之间的骨架误差, 对第一三维网格进行调整,得到第二骨架对应的第二三维网格。
其中,该调优模型可以预先根据第六样本数据进行训练得到。第六样本数据可以包括样本目标图像以及样本目标图像对应的样本三维网格。
需要说明的是,在一个实施例中,上述步骤101-步骤104中的每个步骤均可以通过对应的深度学习模型来实现,另外,各个步骤对应的深度学习模型中的部分或全部模型还可以集成在一个模型中,本申请实施例中对此不做限定。
作为一个示例,图2是本申请实施例提供的一种三维网格重建系统的逻辑结构示意图,如图2所示,该系统包括初始三维重建模型21、第一骨架模型22和调优模型23。三维网格重建过程中,可以先获取视频中的至少一个视频帧,然后将这至少一个视频帧分别作为初始三维重建模型21和第一骨架模型22的输入,通过初始三维重建模型22输出目标的第一三维网格以及第一三维网格的第一骨架,通过第一骨架模型22输出目标的第二骨架。然后,将第一三维网格、第一骨架以及第二骨架作为调优模型23的输入,通过调优模型23输出第二骨架对应的第二网格。
本申请实施例中,一方面可以根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,以及提取第一三维网格的第一骨架。由于第一三维网格是根据目标图像的全局图像特征生成的,因此在整体上的准确性和鲁棒性较高,而第一骨架是第一三维网格的骨架,因此同样在整体上的准确性和鲁棒性较高。另一方面可以根据至少一个第二目标图像的局部图像特征,生成第二骨架。由于第二骨架是根据目标图像的局部图像特征生成的,因此在局部细节和深度上的准确度较高。之后,通过根据第二骨架与第一骨架之间的骨架误差,对第一三维网格进行调整,可以结合目标图像的全局图像特征和局部图像特征,对在整体上的准确性和鲁棒性较高的第一三维网格进行调优,使得调优后的得到的第二三维网格即具有一定的整体上的准确性和鲁棒性,也具有一定的局部细节和深度上的准确度,在整体上的准确性和鲁棒性、以及局部细节和深度上的准确度取得了一定的平衡,进一步完善了三维网格重建结果,提高了三维网格重建效果。
为了便于理解,接下来将以基于视频进行人体三维模型重建的场景为例,对人体三维模型重建中需要的网络模型的训练过程以及人体三维模型重建过程 进行举例说明。
图3是本申请实施例提供的一种人体三维模型重建方法的流程图,如图3所示,该方法包括如下步骤:
步骤301:从人体视频中采集人体的运动捕捉(Motion Capture,MoCap)数据。
其中,人体的运动捕捉数据是指通过一定的技术手段的获取运动中的人体的位姿数据和外形信息。人体的位姿数据可以包括关节的角度、肢体的位置、肢体的宽度等。在本实施例中,采集的运动捕捉数据可以为下述模型信息提供人体位姿和三维网格的标定。
作为一个示例,可以采用深度图法、标记点法等算法中的一种或多种,来采集人体视频的运动捕捉数据。比如,采用深度图和标记点相结合的方法,来采集人体视频的运动捕捉数据。
例如,预先在采集空间中布置多个普通视频相机和多个深度相机。然后,选取人体特征不同的多个志愿者到采集空间中接受采集。在采集过程开始之前,为每个采集者穿戴上多个用作标记的标记球。这些标记球被固定在被采集者的多个不同的重要节点的内外侧。其中,人体特征可以包括性别、身高、体型等特征。重要节点可以包括人体的膝盖、髋、脚踝、脊柱、骨盆、头等节点。
比如,在采集空间中布置4个普通视频相机和4个深度相机,然后选取50个不同性别、身高、体型的志愿者到采集空间接受采集。在采集过程开始之前,为每个采集者穿戴上34个用作标记的标记球,这些标记球被固定在被采集者的重要节点(如如膝盖,髋,脚踝,脊柱,骨盆,头等)的内外侧。
开始采集以后,让每个志愿者做出不同的动作,与此同时,采集空间中的相机捕捉受试者身上的标记球的位置。这样,也就可以得到和视频中的每个视频帧对应的人体的运动捕捉数据。可以采集与视频中的多个视频帧对应的人体的运动捕捉数据作为训练数据,用于进行模型训练。比如,采集1280组数据用于模型训练。
步骤302:根据采集的人体的运动捕捉数据,对人体的三维网格和位姿进行标注,得到标注数据。
基于人体的运动捕捉数据,我们希望得到人体的位姿和三维网格的标注数据,用于后面的训练。本申请实施例中,可以将人体的位姿可以作为对应三维 网格的骨架,用于训练以下涉及骨架的网络模型。
作为一个示例,可以基于稀疏标记的形状位姿估计算法(Motion and Shape from Sparse Marker,MoSh)来估计人体的位姿和三维网格,以得到标注数据。这种方法可以估计得到准确的位姿和三维网格,估计的结果误差在毫米级。
步骤303:根据人体视频中的视频帧以及对应的标注数据,训练初始三维重建模型。
其中,初始三维重建模型用于根据视频中的视频帧生成人体的三维网格。可以根据视频中的视频帧以及对应的标注数据中的三维网格,对初始三维重建模型进行训练。
比如,初始三维重建模型可以包括第一特征提取网络和第一全连接网络。第一特征提取网络用于提取视频帧的全局图像特征,将提取的全局图像特征输入至第一全连接网络。第一全连接网络用于根据全局图像特征生成人体的三维网格。比如,第一特征提取网络可以为ResNet模型。
另外,还可以建立一个线性的、由三维网格到其骨架(位姿)的映射关系,该初始三维重建模型还可以包括该映射关系。始三维重建模型用于根据视频中的视频帧生成人体的三维网格,以及从生成的三维网格中提取三维网格的骨架。
相应地,可以根据据人体视频中的视频帧以及对应的标注数据中的人体的三维网格和位姿,对该初始三维重建模型进行训练。值得注意的是,这里得到的三维网格只考虑了图像的整体特征,对骨架结构的信息把握不够,在本实施例中,通过对局部人体关节信息的提取,可以对人体的三维网格的准确性有较大提升。
步骤304:根据人体视频中的视频帧以及对应的标注数据中的位姿,训练人体骨架模型。
其中,人体骨架模型用于根据视频中的视频帧,生成人体的三维骨架。比如,人体骨架模型可以包括第二特征提取网络和骨架生成网络,第二特征提取网络用于提取视频帧的局部图像特征,骨架生成网络用于根据提取的局部图像特征生成人体的骨架(位姿)。
进一步地,骨架生成网络还可以包括二维骨架生成网络和三维骨架生成网络,二维骨架生成网络用于根据每个视频帧的局部图像特征生成每个视频帧中人体的二维骨架。三维骨架生成网络用于根据一个或多个视频帧中人体的二维 骨架生成人体的三维骨架。比如,三维骨架生成网络可以对视频中每个视频帧中人体的二维骨架进行多视角融合,通过互补信息来生成每一个视频帧中人体的三维骨架。
值得注意的是,人体是一个非刚体结构,每一个视频帧的人体外形都是有差异的。在这个网络中,网络本身通过学习人体的不变量和每一个视频帧的差异来实现非刚体的多视角融合问题。尽管如此,由于刚体问题相比于非刚体是一个退化的问题,因此这个网络对于刚体也同样适用。在训练时,可以使用步骤302中得到的人体位姿作为标注数据,做端到端的全监督训练。
步骤305:根据标注数据训练调优模型。
完成人体骨架模型和初始三维重建模型的训练后,可以训练生成三维网络的调优模型。调优模型用于根据初始三维重建模型生成的骨架以及人体骨架模型生成的骨架之间的骨架误差,对初始三维重建模型生成的三维网格进行调整,得到调优后的三维网格。
其中,可以根据步骤302得到的标注数据来训练该调优模型。
在训练好初始三维重建模型、人体骨架模型和调优模型之后,即可获取人体视频中的至少一个视频帧,根据至少一个视频帧,通过这三个模型生成准确度较高的人体的三维网格。
步骤306:将人体视频中的至少一个视频帧作为训练好的初始三维重建模型的输入,输出人体的第一三维网格和第一骨架。
步骤307:将人体视频中的至少一个视频帧作为人体骨架模型的输入,输出人体的第二骨架。
步骤308:将人体的第一三维网格和第一骨架,以及人体的第二骨架作为调优模型的输入,输出第二骨架对应的第二三维网格。
需要说明的是,本申请实施例提供的方法可以应用于多种三维重建场景中。例如,在对普通物体的三维重建中,可以用中轴面提取算法提取物体的第一三维网格的第一骨架,然后通过图像的局部图像特征提取物体的骨架中关键点的位置,根据提取的关键点的位置,生成物体的第二骨架,再根据第一骨架和第二骨架,对第一三维网格进行调整,得到物体的三维网格重建结果。又例如,在场景的三维重建中,我可以使用三维角点检测算法生成场景的第一三维网格的第一骨架,然后根据视频中视频帧的局部图像特征生成场景的第二骨架,再 根据第一骨架和第二骨架,对第一三维网格进行调整,得到场景的三维网格重建结果。
图4是本申请实施例提供的一种三维网格重建装置的结构框图,该装置可以集中于计算机设备中,如图4所示,该装置包括:
第一生成模块401,用于根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,所述第一目标图像为包含所述目标的图像;
提取模块402,用于提取所述第一三维网格的骨架,得到第一骨架,所述第一骨架用于表征所述第一三维网格的内部结构;
第二生成模块403,用于根据至少一个第二目标图像的局部图像特征,生成第二骨架,所述第二目标图像为包含所述目标的图像,所述第二骨架用于表征所述目标的内部结构;
调整模块404,用于根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格。
可选地,第一生成模块401用于:
根据所述至少一个第一目标图像的全局图像特征,确定所述目标的空间占用信息,所述空间占用信息用于指示空间中每个点被所述目标占用的概率;
根据所述空间占用信息,生成所述目标的三维网格,将生成的三维网格作为所述第一三维网格。
可选地,第二生成模块403用于:
根据所述至少一个第二目标图像中每个第二目标图像的局部图像特征,生成每个第二目标图像对应的骨架节点热图,所述骨架节点热图用于指示所述目标的骨架中每个节点在对应第二目标热图中不同位置出现的概率;
根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架;
根据所述至少一个第二目标图像对应的二维骨架,生成所述目标的三维骨架,将生成的三维骨架作为所述第二骨架。
可选地,第二生成模块403用于用于:
根据每个第二目标图像对应的骨架节点热图,确定每个第二目标图像对应的骨架节点;
根据每个第二目标图像对应的骨架节点以及所述目标的预设拓扑结构,生成每个第二目标图像对应的二维骨架。
可选地,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,所述非线性延展误差是指每对骨骼的长度不同引起的误差。
可选地,所述装置还包括确定模块,确定模块用于:
对于所述第一骨架和所述第二骨架中对应的多对骨骼中的每对骨骼,确定所述每对骨骼之间的角度误差;
根据所述每对骨骼之间的角度误差,确定所述每对骨骼之间的平移误差;
根据所述每对骨骼之间的角度误差和平移误差,确定所述每对骨骼之间的非线性延展误差。
可选地,所述调整模块404包括:
确定单元,用于根据所述第二骨架与所述第一骨架之间的骨架误差,确定所述第二骨架对应的第二三维网格与所述第一三维网格之间的空间映射关系;
变换单元,用于根据所述第二三维网格与所述第一三维网格之间的空间映射关系,对所述第一三维网格进行空间变换,得到所述第二三维网格。
可选地,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差;确定单元用于:
根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系;
根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系。
可选地,所述确定单元用于:
根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,通过以下公式,确定所述第二三维网格 上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000040
其中,
Figure PCTCN2021137703-appb-000041
为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,
Figure PCTCN2021137703-appb-000042
为所述第一三维网格上的第j个网格顶点,Ψ (i)为所述第i对骨骼之间的角度误差,T (i)为所述第i对骨骼之间的平移误差,Δ (i)为所述第i对骨骼之间的非线性延展误差,W j,i为所述第一三维网格上的第j个网格顶点和所述第i对骨骼对应的权重,i和j均为正整数。
可选地,所述确定单元用于:
根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,通过以下公式,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系:
Figure PCTCN2021137703-appb-000043
其中,
Figure PCTCN2021137703-appb-000044
为所述第二三维网格上的第j个网格顶点,
Figure PCTCN2021137703-appb-000045
为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,所述A j,i为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量对应的权重。
可选地,所述第二三维网格与所述第一三维网格之间的空间映射关系包括所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系;所述调整模块404用于:
根据所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第一三维网格的网格顶点的空间位置进行变换,得到所述第二三维网格。
本申请实施例中,一方面可以根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,以及提取第一三维网格的第一骨架。由于第一三维网格是根据目标图像的全局图像特征生成的,因此在整体上的准确性和鲁棒性 较高,而第一骨架是第一三维网格的骨架,因此同样在整体上的准确性和鲁棒性较高。另一方面可以根据至少一个第二目标图像的局部图像特征,生成第二骨架。由于第二骨架是根据目标图像的局部图像特征生成的,因此在局部细节和深度上的准确度较高。之后,通过根据第二骨架与第一骨架之间的骨架误差,对第一三维网格进行调整,可以结合目标图像的全局图像特征和局部图像特征,对在整体上的准确性和鲁棒性较高的第一三维网格进行调优,使得调优后的得到的第二三维网格即具有一定的整体上的准确性和鲁棒性,也具有一定的局部细节和深度上的准确度,在整体上的准确性和鲁棒性、以及局部细节和深度上的准确度取得了一定的平衡,进一步完善了三维网格重建结果,提高了三维网格重建效果。
图5是本申请实施例提供的一种计算机设备500的结构框图。该计算机设备500可以是手机、平板电脑、台式计算机、服务器等电子设备。该计算机设备500可用于实施上述实施例中提供的血管中心线提取方法。
通常,计算机设备500包括有:处理器501和存储器502。
处理器501可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器501可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器501也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器501可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器501还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器502可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器502还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器502中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少 一个指令用于被处理器501所执行以实现本申请中方法实施例提供的血管中心线提取方法。
在一些实施例中,计算机设备500还可选包括有:外围设备接口503和至少一个外围设备。处理器501、存储器502和外围设备接口503之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口503相连。具体地,外围设备可以包括:显示屏504、音频电路505、通信接口506和电源507中的至少一种。
本领域技术人员可以理解,图5中示出的结构并不构成对计算机设备500的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性的实施例中,还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,所述指令被处理器执行时实现上述血管中心线提取方法。
在示例性实施例中,还提供了一种计算机程序产品,当该计算机程序产品被执行时,其用于实现上述血管中心线提取方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种三维网格重建方法,其特征在于,所述方法包括:
    根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,所述第一目标图像为包含所述目标的图像;
    提取所述第一三维网格的骨架,得到第一骨架,所述第一骨架用于表征所述第一三维网格的内部结构;
    根据至少一个第二目标图像的局部图像特征,生成第二骨架,所述第二目标图像为包含所述目标的图像,所述第二骨架用于表征所述目标的内部结构;
    根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格。
  2. 如权利要求1所述的方法,其特征在于,所述根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,包括:
    根据所述至少一个第一目标图像的全局图像特征,确定所述目标的空间占用信息,所述空间占用信息用于指示空间中每个点被所述目标占用的概率;
    根据所述空间占用信息,生成所述目标的三维网格,将生成的三维网格作为所述第一三维网格。
  3. 如权利要求1所述的方法,其特征在于,所述根据至少一个第二目标图像的局部图像特征,生成第二骨架,包括:
    根据所述至少一个第二目标图像中每个第二目标图像的局部图像特征,生成每个第二目标图像对应的骨架节点热图,所述骨架节点热图用于指示所述目标的骨架中每个节点在对应第二目标热图中不同位置出现的概率;
    根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架;
    根据所述至少一个第二目标图像对应的二维骨架,生成所述目标的三维骨架,将生成的三维骨架作为所述第二骨架。
  4. 如权利要求3所述的方法,其特征在于,所述根据每个第二目标图像对应的骨架节点热图,生成每个第二目标图像对应的二维骨架,包括:
    根据每个第二目标图像对应的骨架节点热图,确定每个第二目标图像对应 的骨架节点;
    根据每个第二目标图像对应的骨架节点以及所述目标的预设拓扑结构,生成每个第二目标图像对应的二维骨架。
  5. 如权利要求1所述的方法,其特征在于,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应骨骼之间的角度误差、平移误差和非线性延展误差,所述非线性延展误差是指对应骨骼的长度不同引起的误差。
  6. 如权利要求5所述的方法,其特征在于,所述根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格之前,还包括:
    对于所述第一骨架和所述第二骨架中对应的多对骨骼中的每对骨骼,确定所述每对骨骼之间的角度误差;
    根据所述每对骨骼之间的角度误差,确定所述每对骨骼之间的平移误差;
    根据所述每对骨骼之间的角度误差和平移误差,确定所述每对骨骼之间的非线性延展误差。
  7. 如权利要求1-6任一所述的方法,其特征在于,所述根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格,包括:
    根据所述第二骨架与所述第一骨架之间的骨架误差,确定所述第二骨架对应的第二三维网格与所述第一三维网格之间的空间映射关系;
    根据所述第二三维网格与所述第一三维网格之间的空间映射关系,对所述第一三维网格进行空间变换,得到所述第二三维网格。
  8. 如权利要求7所述的方法,其特征在于,所述第二骨架与所述第一骨架之间的骨架误差包括所述第一骨架和所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差;
    所述根据所述第二骨架与所述第一骨架之间的骨架误差,确定所述第二骨 架对应的第二三维网格与所述第一三维网格之间的空间映射关系,包括:
    根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系;
    根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系。
  9. 如权利要求8所述的方法,其特征在于,所述根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,包括:
    根据所述第一骨架与所述第二骨架中对应的多对骨骼中每对骨骼之间的角度误差、平移误差和非线性延展误差,通过以下公式,确定所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系:
    Figure PCTCN2021137703-appb-100001
    其中,
    Figure PCTCN2021137703-appb-100002
    为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,
    Figure PCTCN2021137703-appb-100003
    为所述第一三维网格上的第j个网格顶点,Ψ (i)为所述第i对骨骼之间的角度误差,T (i)为所述第i对骨骼之间的平移误差,Δ (i)为所述第i对骨骼之间的非线性延展误差,W j,i为所述第一三维网格上的第j个网格顶点和所述第i对骨骼对应的权重,i和j均为正整数。
  10. 如权利要求8所述的方法,其特征在于,所述根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三 维网格上的网格顶点之间的空间映射关系,包括:
    根据所述第二三维网格上的网格顶点在所述多对骨骼中每对骨骼上的分量与所述第一三维网格上的网格顶点之间的空间映射关系,通过以下公式,对所述第二三维网格上的网格顶点在所述多对骨骼上的分量进行加和处理,得到所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系:
    Figure PCTCN2021137703-appb-100004
    其中,
    Figure PCTCN2021137703-appb-100005
    为所述第二三维网格上的第j个网格顶点,
    Figure PCTCN2021137703-appb-100006
    为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量,所述A j,i为所述第二三维网格上的第j个网格顶点在所述多对骨骼中第i对骨骼上的分量对应的权重。
  11. 如权利要求7所述的方法,其特征在于,所述第二三维网格与所述第一三维网格之间的空间映射关系包括所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系;
    所述根据所述第二三维网格与所述第一三维网格之间的空间映射关系,对所述第一三维网格进行空间变换,得到所述第二三维网格,包括:
    根据所述第二三维网格上的网格顶点与所述第一三维网格上的网格顶点之间的空间映射关系,对所述第一三维网格的网格顶点的空间位置进行变换,得到所述第二三维网格。
  12. 一种三维网格重建装置,其特征在于,所述装置包括:
    第一生成模块,用于根据至少一个第一目标图像的全局图像特征,生成目标的第一三维网格,所述第一目标图像为包含所述目标的图像;
    提取模块,用于提取所述第一三维网格的骨架,得到第一骨架,所述第一骨架用于表征所述第一三维网格的内部结构;
    第二生成模块,用于根据至少一个第二目标图像的局部图像特征,生成第二骨架,所述第二目标图像为包含所述目标的图像,所述第二骨架用于表征所述目标的内部结构;
    调整模块,用于根据所述第二骨架与所述第一骨架之间的骨架误差,对所述第一三维网格进行调整,得到所述第二骨架对应的第二三维网格。
  13. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至11任一项所述的方法。
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至11任一项所述的方法。
PCT/CN2021/137703 2021-05-07 2021-12-14 三维网格重建方法、装置、设备及存储介质 WO2022233137A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110493385.XA CN113298948B (zh) 2021-05-07 2021-05-07 三维网格重建方法、装置、设备及存储介质
CN202110493385.X 2021-05-07

Publications (1)

Publication Number Publication Date
WO2022233137A1 true WO2022233137A1 (zh) 2022-11-10

Family

ID=77320927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137703 WO2022233137A1 (zh) 2021-05-07 2021-12-14 三维网格重建方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113298948B (zh)
WO (1) WO2022233137A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298948B (zh) * 2021-05-07 2022-08-02 中国科学院深圳先进技术研究院 三维网格重建方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200184721A1 (en) * 2018-12-05 2020-06-11 Snap Inc. 3d hand shape and pose estimation
CN111882666A (zh) * 2020-07-20 2020-11-03 浙江商汤科技开发有限公司 三维网格模型的重建方法及其装置、设备、存储介质
CN112598735A (zh) * 2020-12-21 2021-04-02 西北工业大学 一种融合三维模型信息的单张图像物体位姿估计方法
CN113298948A (zh) * 2021-05-07 2021-08-24 中国科学院深圳先进技术研究院 三维网格重建方法、装置、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10789754B2 (en) * 2018-07-27 2020-09-29 Adobe Inc. Generating target-character-animation sequences based on style-aware puppets patterned after source-character-animation sequences
CN111382618B (zh) * 2018-12-28 2021-02-05 广州市百果园信息技术有限公司 一种人脸图像的光照检测方法、装置、设备和存储介质
CN110111247B (zh) * 2019-05-15 2022-06-24 浙江商汤科技开发有限公司 人脸变形处理方法、装置及设备
CN110276768B (zh) * 2019-06-28 2022-04-05 京东方科技集团股份有限公司 图像分割方法、图像分割装置、图像分割设备及介质
CN110874864B (zh) * 2019-10-25 2022-01-14 奥比中光科技集团股份有限公司 获取对象三维模型的方法、装置、电子设备及系统
CN111862299A (zh) * 2020-06-15 2020-10-30 上海非夕机器人科技有限公司 人体三维模型构建方法、装置、机器人和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200184721A1 (en) * 2018-12-05 2020-06-11 Snap Inc. 3d hand shape and pose estimation
CN111882666A (zh) * 2020-07-20 2020-11-03 浙江商汤科技开发有限公司 三维网格模型的重建方法及其装置、设备、存储介质
CN112598735A (zh) * 2020-12-21 2021-04-02 西北工业大学 一种融合三维模型信息的单张图像物体位姿估计方法
CN113298948A (zh) * 2021-05-07 2021-08-24 中国科学院深圳先进技术研究院 三维网格重建方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113298948A (zh) 2021-08-24
CN113298948B (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
CN108335353B (zh) 动态场景的三维重建方法、装置和系统、服务器、介质
CN111598998B (zh) 三维虚拟模型重建方法、装置、计算机设备和存储介质
CN113012282B (zh) 三维人体重建方法、装置、设备及存储介质
US20240046557A1 (en) Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
CN112927362A (zh) 地图重建方法及装置、计算机可读介质和电子设备
US11928778B2 (en) Method for human body model reconstruction and reconstruction system
CN113643366B (zh) 一种多视角三维对象姿态估计方法及装置
CN112927363A (zh) 体素地图构建方法及装置、计算机可读介质和电子设备
TW202309834A (zh) 模型重建方法及電子設備和電腦可讀儲存介質
WO2022233137A1 (zh) 三维网格重建方法、装置、设备及存储介质
CN117333637B (zh) 三维场景的建模及渲染方法、装置及设备
Xia et al. Cascaded 3d full-body pose regression from single depth image at 100 fps
CN114663983A (zh) 网状拓扑结构获取方法、装置、电子设备及存储介质
CN111178501B (zh) 双循环对抗网络架构的优化方法、系统、电子设备及装置
CN114187404A (zh) 一种用于近海域高分辨率的三维重建方法和系统
Shen et al. Structure Preserving Large Imagery Reconstruction
CN115049764B (zh) Smpl参数预测模型的训练方法、装置、设备及介质
CN115294295B (zh) 一种人体模型的构建方法、装置、电子设备及存储介质
CN112530004B (zh) 一种三维点云重建方法、装置及电子设备
Li SuperGlue-Based Deep Learning Method for Image Matching from Multiple Viewpoints
CN116228994B (zh) 三维模型获取方法、装置、设备及存储介质
Yin et al. MoFiM: A morphable fish modeling method for underwater binocular vision system
CN115908515B (zh) 影像配准方法、影像配准模型的训练方法及装置
Huang Research on Three-dimensional Reconstruction
WO2022193104A1 (zh) 一种光场预测模型的生成方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939781

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21939781

Country of ref document: EP

Kind code of ref document: A1