CN111598998A - Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium - Google Patents

Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111598998A
CN111598998A CN202010400447.3A CN202010400447A CN111598998A CN 111598998 A CN111598998 A CN 111598998A CN 202010400447 A CN202010400447 A CN 202010400447A CN 111598998 A CN111598998 A CN 111598998A
Authority
CN
China
Prior art keywords
dimensional
image
target object
point cloud
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010400447.3A
Other languages
Chinese (zh)
Other versions
CN111598998B (en
Inventor
葛志鹏
曹煊
葛彦昊
汪铖杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010400447.3A priority Critical patent/CN111598998B/en
Publication of CN111598998A publication Critical patent/CN111598998A/en
Application granted granted Critical
Publication of CN111598998B publication Critical patent/CN111598998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application relates to a three-dimensional virtual model reconstruction method, a three-dimensional virtual model reconstruction device, computer equipment and a storage medium. The method comprises the following steps: acquiring an image of a target object, the target object having a moving limb; extracting the features of the image, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales; generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales; reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image. By adopting the method, the accuracy of the reconstruction of the three-dimensional virtual model can be improved.

Description

Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for reconstructing a three-dimensional virtual model, a computer device, and a storage medium.
Background
With the development of computer technology, Artificial Intelligence (AI) has emerged, which is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Artificial intelligence is currently being studied and applied in a number of areas, for example, to achieve reconstruction of three-dimensional models through artificial intelligence. The reconstruction of the three-dimensional model is often applied to the aspects of virtual reality scenes, human body special effects, human body detection and the like.
The traditional three-dimensional virtual model reconstruction method is obtained by matching and aligning irregular point clouds of a depth map with a three-dimensional human body regular grid model. However, the result of matching and aligning in this way depends heavily on the quality of the depth map, and if the resolution of the depth map is low, the reconstructed three-dimensional virtual model is not accurate.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a three-dimensional virtual model reconstruction method, apparatus, computer device, and storage medium capable of improving reconstruction accuracy.
A method of three-dimensional virtual model reconstruction, the method comprising:
acquiring an image of a target object, the target object having a moving limb;
extracting the features of the image, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales;
reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
A three-dimensional virtual model reconstruction apparatus, the apparatus comprising:
an image acquisition module for acquiring an image of a target object, the target object having a moving limb;
the characteristic extraction module is used for extracting the characteristics of the image and performing graph convolution processing on the extracted characteristics to obtain point cloud coordinates with different scales;
the generating module is used for generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales;
a reconstruction module for reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring an image of a target object, the target object having a moving limb;
extracting the features of the image, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales;
reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring an image of a target object, the target object having a moving limb;
extracting the features of the image, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales;
reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
According to the three-dimensional virtual model reconstruction method, the three-dimensional virtual model reconstruction device, the computer equipment and the storage medium, the image of the target object is obtained, the target object is provided with movable limbs, the image is subjected to feature extraction, the extracted features are subjected to image convolution processing to obtain point cloud coordinates of different scales, the three-dimensional parameters of the target object are generated according to the point cloud coordinates of different scales, and the three-dimensional parameters of the target object in the image can be accurately generated through image convolution. And reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object, wherein the three-dimensional virtual model has a limb shape matched with the target object in the image, so that the reconstruction accuracy of the three-dimensional virtual model is improved.
A training method of a reconstructed network, the method comprising:
acquiring a first training image of a first object; the first object has a moving limb;
extracting features of the first training image through a reconstruction network to be trained, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales;
constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters;
training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with matched limb forms with the object.
A training apparatus to reconstruct a network, the apparatus comprising:
a training image acquisition module for acquiring a first training image of a first object; the first object has a moving limb;
the input module is used for extracting the characteristics of the first training image through a reconstruction network to be trained and carrying out graph convolution processing on the extracted characteristics to obtain point cloud coordinates with different scales;
a prediction module for generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales;
the construction module is used for constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters;
the training module is used for training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with matched limb forms with the object.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a first training image of a first object; the first object has a moving limb;
extracting features of the first training image through a reconstruction network to be trained, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales;
constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters;
training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with matched limb forms with the object.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a first training image of a first object; the first object has a moving limb;
extracting features of the first training image through a reconstruction network to be trained, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales;
constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters;
training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with matched limb forms with the object.
According to the training method, the training device, the computer equipment and the storage medium for the reconstruction network, the first training image of the first object with the movable limbs is obtained, the characteristics of the first training image are extracted through the reconstruction network to be trained, graph convolution processing is carried out on the extracted characteristics, point cloud coordinates of different scales are obtained, predicted three-dimensional parameters of the first object are generated based on the point cloud coordinates of different scales, a target loss function is built according to the point cloud coordinates of different scales and the predicted three-dimensional parameters, the reconstruction network to be trained is trained based on the target loss function, and the trained reconstruction network is obtained when the training stopping condition is met, so that the trained reconstruction network can predict the three-dimensional parameters of the target object in the two-dimensional image more accurately. And accurately predicting the three-dimensional parameters of the target object in the two-dimensional image by using the trained reconstruction network, thereby accurately reconstructing the three-dimensional virtual model corresponding to the target object according to the three-dimensional parameters.
Drawings
FIG. 1 is a diagram of an application environment of a method for reconstructing a three-dimensional virtual model according to an embodiment;
FIG. 2 is a schematic flow chart illustrating a method for reconstructing a three-dimensional virtual model according to an embodiment;
FIG. 3 is a schematic flow chart illustrating steps of extracting features of an image and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales in one embodiment;
FIG. 4 is a schematic diagram of graph convolution in one embodiment;
FIG. 5 is a flowchart illustrating the steps of obtaining an image of a target object in one embodiment;
FIG. 6 is a schematic flow chart of a three-dimensional virtual model reconstruction method according to an embodiment;
FIG. 7(a) is a block diagram of a reconstructed three-dimensional virtual model in an embodiment;
FIG. 7(b) is a flowchart of reconstructing a three-dimensional virtual model corresponding to a target human body in a video in real time according to an embodiment;
FIG. 8 is a schematic flow chart diagram illustrating a training method for reconstructing a network according to an embodiment;
FIG. 9 is a flowchart of the steps for constructing a target loss function based on point cloud coordinates and predicted three-dimensional parameters at different scales in one embodiment;
FIG. 10 is a block diagram of a three-dimensional virtual model reconstruction of a human body in a two-dimensional image according to an embodiment;
FIG. 11 is a block diagram showing an example of a three-dimensional virtual model reconstructing apparatus;
FIG. 12 is a block diagram of a training apparatus for reconstructing a network according to an embodiment;
FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The three-dimensional virtual model reconstruction method provided by the application can be applied to the application environment shown in fig. 1. The terminal 102 acquires an image of a target object having a moving limb. The terminal sends the image to the server 104, the server 104 extracts the features of the image through a reconstruction network, and performs graph convolution processing on the extracted features to obtain point cloud coordinates of different scales. The server 104 generates three-dimensional parameters of the target object according to the point cloud coordinates of different scales through the reconstruction network. Then, the server 104 returns the three-dimensional parameters of the target object to the terminal 102. The terminal 102 reconstructs a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal 102 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication, and the application is not limited thereto.
In one embodiment, the three-dimensional virtual model reconstruction method can be applied to human body three-dimensional virtual model reconstruction, and comprises the following steps:
the terminal obtains a human body training image of a first object, and performs feature extraction and graph convolution layer processing on the human body training image to obtain point cloud coordinates of different scales in the human body training image. The terminal can generate three-dimensional parameters and camera parameters corresponding to the first object in the human body training image based on the point cloud coordinates of different scales. And the terminal takes the point cloud coordinates with different scales as point cloud labels with different scales and takes the three-dimensional parameters as a first three-dimensional human body label corresponding to the first object in the human body training image. The three-dimensional parameters comprise three-dimensional human body posture parameters and three-dimensional human body shape parameters, and the terminal takes the three-dimensional human body posture parameters as three-dimensional human body posture labels corresponding to the human body training images.
The terminal converts the three-dimensional human body posture parameters corresponding to the first object in the human body training image into two-dimensional human body posture parameters through the camera parameters, and the two-dimensional human body posture parameters are used as two-dimensional human body posture labels corresponding to the human body training image.
The terminal may acquire three-dimensional parameters of the first object through the motion capture device, the three-dimensional parameters also including three-dimensional human body posture parameters and three-dimensional human body shape parameters. And the terminal takes the three-dimensional parameters as a second three-dimensional human body label corresponding to the first object.
The terminal inputs the human body training image into a reconstruction network to be trained, and the training step of the reconstruction network comprises the following steps:
and the terminal extracts the characteristics of the human body training image through a characteristic extraction layer of the reconstruction network to be trained to obtain a corresponding characteristic diagram.
And the terminal carries out graph convolution processing on the characteristic graph through a graph convolution layer of the reconstructed network to obtain point cloud characteristics with different scales.
And the terminal performs regression processing on the point cloud characteristics of different scales through the graph convolution layer to obtain point cloud coordinates of different scales.
And the terminal generates a predicted three-dimensional parameter of a first object in the human body training image and a predicted camera parameter corresponding to the human body training image based on point cloud coordinates of different scales through the graph convolution layer. The predicted three-dimensional parameters comprise predicted three-dimensional human body posture parameters and predicted three-dimensional human body shape parameters.
And the terminal constructs a first loss function according to the point cloud coordinates of different scales and the point cloud labels of corresponding scales.
And the terminal constructs a second loss function according to the predicted three-dimensional parameters and the first three-dimensional human body label.
And the terminal constructs a third loss function according to the predicted three-dimensional parameters and the second three-dimensional human body label.
And then, the terminal converts the predicted three-dimensional human body posture parameters into predicted two-dimensional human body posture parameters through the predicted camera parameters.
And the terminal constructs a fourth loss function according to the predicted two-dimensional human body posture parameters and the corresponding two-dimensional posture labels.
And then, the terminal acquires a human body training image of the second object and acquires three-dimensional human body posture parameters corresponding to the second object in the human body training image. And taking the three-dimensional human body posture parameter corresponding to the second object in the human body training image as a three-dimensional human body posture label.
And the terminal inputs the human body training image of the second object into a reconstruction network to be trained to obtain the predicted three-dimensional human body posture parameter of the second object in the human body training image. The human body training image of the first object is an image acquired outdoors, the human body training image of the second object is a human body image acquired indoors, and the second object and the first object can be the same object or different objects.
And constructing a fifth loss function on the terminal according to the predicted three-dimensional human body posture parameters and the three-dimensional human body posture labels corresponding to the second object in the human body training image.
And constructing a target loss function on the terminal according to the first loss function, the second loss function, the third loss function, the fourth loss function and the fifth loss function.
And training the reconstruction network to be trained by the terminal based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met.
Then, the terminal uses the trained reconstruction network to reconstruct a three-dimensional virtual model corresponding to the human body in the image, and the method comprises the following steps:
the terminal obtains an image containing a target human body, and cuts the image containing the target human body by taking the target human body as a center to obtain a human body image containing the target human body with a preset size.
And the terminal inputs the human body image into the trained reconstruction network, and performs characteristic extraction on the image through a characteristic extraction layer of the reconstruction network to obtain a corresponding characteristic diagram.
And the terminal carries out graph convolution processing on the characteristic graph through a graph convolution layer of the reconstructed network to obtain point cloud characteristics with different scales.
And the terminal performs regression processing on the point cloud characteristics of different scales through the graph convolution layer to obtain point cloud coordinates of different scales.
And the terminal generates three-dimensional parameters of the target human body according to the point cloud coordinates of different scales.
The terminal reconstructs a three-dimensional virtual model of the target human body based on the three-dimensional parameters of the target human body to obtain a human body three-dimensional virtual model; the three-dimensional virtual model of the human body has a limb shape matched with a target human body in the human body image.
Next, the terminal projects the three-dimensional virtual model onto the virtual-reality motion-sensing game, and displays the three-dimensional virtual model on the virtual-reality motion-sensing game. And the user executes each operation of the virtual reality motion sensing game by controlling the three-dimensional virtual model in the virtual reality motion sensing game.
The trained reconstruction network is used for carrying out graph convolution processing on the human body image, three-dimensional human body parameters corresponding to a target human body in the human body image are quickly and accurately obtained, accurate reconstruction of a three-dimensional virtual model is achieved through the three-dimensional human body parameters, and reconstruction accuracy and reconstruction efficiency of the human body three-dimensional virtual model are improved.
In an embodiment, as shown in fig. 2, a three-dimensional virtual model reconstruction method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
step 202, an image of a target object is acquired, the target object having a moving limb.
Wherein, the target object is a human body or an animal needing to reconstruct a three-dimensional virtual model.
Specifically, the terminal may obtain an original image of a human or animal whose three-dimensional virtual model needs to be reconstructed, and perform preprocessing on the original image to obtain an image of the target object, and so on. The preprocessing may include operations such as cropping, resolution adjustment, image size scaling, brightness adjustment, and/or contrast adjustment. The original image and the image of the target object are two-dimensional images.
In the embodiment, the image can be a color image, and the color image has higher resolution and richer details for a depth map, and can be used for reconstructing a three-dimensional virtual model of a human body more finely.
In this embodiment, the terminal may obtain the corresponding image by directly shooting the target object, or may obtain the image corresponding to the target object from a local device, a network device, or a third device. The acquired image includes the target object.
And 204, extracting the features of the image, and performing graph convolution processing on the extracted features to obtain point cloud coordinates with different scales.
The graph convolution processing refers to performing convolution processing on a graph, and can be realized by a Graph Convolution Network (GCN). The GCN is a neural network that operates on the graph. The Point Cloud is a massive Point set which expresses the target space distribution and the target surface characteristics in the same space reference system, and after the space coordinates of each sampling Point on the surface of the object are obtained, the Point set is obtained and is called as the Point Cloud. In the present embodiment, the point cloud refers to a grid point of the target object surface.
Specifically, the terminal may perform feature extraction on an image of the target object to obtain features corresponding to the image, so as to obtain a feature map. And then, the terminal performs graph convolution processing on the feature map to obtain point cloud features with different scales. And the terminal performs convolution processing with the channel number being 3 on the point cloud characteristics with different scales to obtain point cloud coordinates with different scales. The point cloud coordinates are three-dimensional coordinates.
And step 206, generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales.
Wherein the three-dimensional parameter is a Skinned Multi-person linear parameter (SMPL, Skinned Multi-PersonLinearModel). The SMPL parameters contain 6890 body surface points, and 24 joint points.
Specifically, the terminal can perform down-sampling and full-connection processing on point cloud coordinates of different scales to obtain three-dimensional parameters of the target object.
In this embodiment, when the target object is a human body, a three-dimensional model of the human body as a whole can be reconstructed by covering the human body with the multi-person linear parameters.
Step 208, reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
Specifically, the three-dimensional parameters include three-dimensional pose parameters and three-dimensional body type parameters. The three-dimensional posture parameter is the joint point coordinate of the target object, and the three-dimensional body type parameter is the characteristic point coordinate of the surface of the target object. After the terminal obtains the three-dimensional posture parameters and the three-dimensional body type parameters corresponding to the target object in the image, a model is constructed in a three-dimensional space according to the three-dimensional coordinates corresponding to the three-dimensional posture parameters and the three-dimensional coordinates corresponding to the three-dimensional body type parameters, and therefore a three-dimensional virtual model is obtained.
In this embodiment, the three-dimensional virtual model can be applied to virtual reality motion sensing games, virtual fitting, virtual hair try and video special effect production, but is not limited thereto.
In the three-dimensional virtual model reconstruction method, the image of the target object is obtained, the target object has movable limbs, the image is subjected to feature extraction, the extracted features are subjected to graph convolution processing to obtain point cloud coordinates of different scales, the three-dimensional parameters of the target object are generated according to the point cloud coordinates of different scales, and the three-dimensional parameters of the target object in the image can be accurately generated through graph convolution. And reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object, wherein the three-dimensional virtual model has a limb shape matched with the target object in the image, so that the reconstruction accuracy of the three-dimensional virtual model is improved.
In one embodiment, as shown in fig. 3, the extracting features of the image and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales includes:
and step 302, performing feature extraction on the image through a feature extraction layer of the reconstruction network to obtain a corresponding feature map.
Specifically, the trained reconstruction network includes a feature extraction layer and a graph convolution layer. And the terminal inputs the image containing the target object into a feature extraction layer in the trained reconstruction network, and performs feature extraction on the image through the feature extraction layer to obtain a feature map corresponding to the image.
And 304, carrying out graph convolution processing on the feature graph through a graph convolution layer of the reconstructed network to obtain point cloud features with different scales.
Specifically, the terminal outputs the feature map corresponding to the image to a map convolution layer of a reconstruction network, and obtains feature expression output by each layer through processing of each layer of the map convolution layer. The feature expression output by each layer is the point cloud features with different scales.
In this embodiment, the feature extraction layer may be a ResNet50 network. The graph convolutional layer may be a GCN network.
The terminal acquires feature points and edge sets in the feature map through the map convolutional layer, and constructs an undirected graph according to the adjacent point set of each feature point and the connecting edge set between each feature point and each adjacent point. An undirected graph is composed of nodes, which can be feature points in a feature graph, and edges. And then, the terminal acquires the representation data of each node in the undirected graph through the graph convolution layer, calculates the distance between any two nodes, and takes the distance between any two nodes as the weight of the edge between the two nodes. And generating an adjacency matrix according to the weights. The graph volume layer carries out graph volume operation on the undirected graph through an activation function, and the graph volume operation comprises the following steps: and the terminal calculates the feature expressions of different scales according to the activation function, the characterization data of the nodes and the weight. The feature expressions of different scales are point cloud features of different scales.
For example, the terminal may calculate point cloud features of different scales corresponding to the image of the target object by the following formula (1):
Figure BDA0002489212110000111
where i and j represent nodes in an undirected graph,
Figure BDA0002489212110000112
for the feature expression of node i at the current level,
Figure BDA0002489212110000113
is the characteristic expression of the node i at the l-th layer; c. CijIs a normalization factor; n is a radical ofiIs a neighbor of node iGathering;
Figure BDA0002489212110000114
representing the weight of node j. σ denotes an activation function, which may be sigmoid or tanh.
And step 306, performing regression processing on the point cloud features with different scales through the graph convolution layer to obtain point cloud coordinates with different scales.
Specifically, after point cloud features of different scales are obtained through calculation, three-dimensional point cloud coordinates of different scales are obtained through convolution processing with the number of feature channels of the graph convolution layer being 3. The characteristic channels are x, y and z.
For example, the terminal may calculate point cloud features of different scales by the following formula:
Figure BDA0002489212110000115
wherein p is a point cloud three-dimensional coordinate of a scale, A is an adjacent matrix of nodes, A is a real symmetric matrix of N × N, W is a weight matrix,
Figure BDA0002489212110000116
is a scale of point cloud features.
In this embodiment, after the point cloud coordinates of different scales are obtained through calculation, the terminal can perform downsampling processing and full-link processing on the point cloud coordinates of different scales, and output three-dimensional parameters of the target object through the full-link layer.
It will be appreciated that the re-establishment network may be applied on a terminal or on a server. The server may be a cloud server. When the reconstruction network is applied to the cloud server, the terminal sends the image of the target object to the cloud server. And the cloud server outputs the three-dimensional parameters of the target object through the reconstruction network and returns the three-dimensional parameters to the terminal. The reconstruction network is applied to the cloud server, so that the storage space of the terminal can be saved.
Fig. 4 is a schematic diagram of graph convolution in an embodiment, and shows an undirected graph constructed from feature points and characterization data in a feature graph. And reconstructing a graph convolution layer in the network to obtain point cloud characteristics with different scales based on each node in the undirected graph and the characterization data of the nodes.
In this embodiment, the features of the image are extracted through the trained reconstruction network to obtain the key information of the image. And carrying out graph convolution processing on the extracted key information through a reconstruction network, converting the key information into point cloud features with different scales, carrying out regression processing on the point cloud features with different scales through the graph convolution layer to obtain point cloud coordinates with different scales so as to accurately output the coordinates of the key feature information of each scale. And the characteristics are extracted and the multi-scale point cloud coordinates are output through a reconstruction network, so that the efficiency is high and the accuracy is high.
In one embodiment, the method further comprises: determining camera parameters corresponding to the image based on point cloud coordinates of different scales; and projecting the three-dimensional virtual model into a two-dimensional image according to the camera parameters corresponding to the image.
The camera parameters refer to parameters for establishing a geometric model of camera imaging. The camera parameters can be generally classified into an external reference (camera extrinsic matrix) and an internal reference (camera intrinsic matrix). The external parameters determine the position and orientation of the camera in a certain three-dimensional space, from which it can be determined how a real-world point (i.e. world coordinates) has undergone rotation and translation and then falls onto another real-world point (i.e. camera coordinates). The internal reference refers to parameters inside the camera, and how the real world point is converted into a pixel point through the lens of the camera, pinhole imaging and electronic conversion after the action of the external reference can be known according to the internal reference. For example, taking a human body as an example, the camera parameters may include a rotation matrix R corresponding to the orientation of the human body, and a translation matrix t mapped by the human body to the two-dimensional image coordinates. Further, a scaling factor may also be included. Wherein, the proportionality coefficient is an internal parameter, and the rotation matrix R and the translation matrix t are external parameters.
Specifically, the terminal carries out graph convolution processing on the feature graph through a graph convolution layer of the reconstructed network to obtain point cloud features of different scales. And performing regression processing on the point cloud characteristics of different scales through the graph convolution layer to obtain point cloud coordinates of different scales. And the terminal generates three-dimensional parameters of the target object and camera parameters corresponding to the image based on point cloud coordinates of different scales through the image convolution layer. And then, the terminal projects the three-dimensional virtual model from the three-dimensional space to the two-dimensional space according to the camera parameters to obtain a two-dimensional image.
In the embodiment, the terminal renders the three-dimensional virtual model into a two-dimensional image through the camera parameters. And the terminal performs inverse transformation on the two-dimensional image according to the cutting scaling information of the image of the target object to obtain a rendered two-dimensional image with the same size as the original image. The original image is an image before the image of the target object is cropped.
In the embodiment, the camera parameters corresponding to the image are determined based on the point cloud coordinates of different scales, and the three-dimensional virtual model is projected into the two-dimensional image according to the camera parameters corresponding to the image, so that the display mode is more attractive and visual. Moreover, the degree of coincidence between the two-dimensional image projected by the three-dimensional virtual model and the image of the target object can be visually displayed, and the three-dimensional virtual model can be visualized.
In one embodiment, acquiring an image of a target object comprises: acquiring an image containing a target object in a video of the target object;
the method further comprises the following steps: projecting a three-dimensional virtual model into the image to replace the target object in the image; and generating a target video based on the image of each frame in the video after replacing the target object.
Specifically, the terminal acquires a video corresponding to a target object of which the three-dimensional virtual model needs to be reconstructed, and acquires an image of the target object included in the video. And the terminal inputs the image containing the target object into a reconstruction network and outputs the three-dimensional parameters corresponding to the target object in the image through the reconstruction network. And the terminal reconstructs a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object in the image. Then, the terminal projects the three-dimensional virtual model of the target object into the image to replace the target object in the image, so as to obtain an image after replacing the target object.
And carrying out the same processing on each frame of image containing the target object in the video to obtain an image of each frame of image replacing the target object. And the terminal generates a target video according to the image of each frame in the video after replacing the target object.
Further, the terminal acquires a video corresponding to a target object needing to reconstruct the three-dimensional virtual model, and acquires each frame of image containing the target object in the video. And the terminal inputs each frame of image containing the target object into a reconstruction network and outputs the three-dimensional parameters corresponding to the target object in each frame of image through the reconstruction network. Then, for the three-dimensional parameters of the target object in each frame of image, the terminal reconstructs the three-dimensional virtual model based on the three-dimensional parameters to obtain the three-dimensional virtual model corresponding to the target object in each frame of image. And the terminal projects the three-dimensional virtual model to the corresponding image, and the three-dimensional virtual model is used for replacing a target object in the corresponding image to obtain the image containing the three-dimensional virtual model in each frame. Then, the terminal can replace the image of the corresponding target object in the video with the image of each frame containing the three-dimensional virtual model to obtain the target video.
In this embodiment, after the terminal obtains the three-dimensional virtual model corresponding to the target object in each frame of image, the three-dimensional virtual model is projected to be a corresponding two-dimensional image through the camera parameters. And then, the terminal replaces each corresponding frame image in the video by each two-dimensional image to obtain the target video.
In this embodiment, each frame in the video of the target object includes an image of the target object, and the three-dimensional parameter corresponding to the target object in each frame of the image is output in real time through the reconstruction network. And reconstructing based on the three-dimensional parameters to obtain a three-dimensional virtual model corresponding to the target object in each frame of image, projecting the three-dimensional virtual model corresponding to each image into the corresponding image to replace the target object in the corresponding image to obtain a target video, so that the three-dimensional virtual model can be projected into the application of a human-computer interaction somatosensory game or a short video, and the reality of a human-computer interaction in the somatosensory game or a three-dimensional virtual reality special effect in the short video is enhanced.
In one embodiment, as shown in FIG. 5, acquiring an image of a target object includes:
step 502, each frame of image containing the target object in the video of the target object is obtained.
Reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb shape matched with a target object in the image, and comprises:
step 504, a three-dimensional parameter sequence is generated based on the three-dimensional parameters of the target object in each frame of image.
Specifically, the terminal acquires a video corresponding to a target object of which the three-dimensional virtual model needs to be reconstructed, and acquires each frame of image containing the target object in the video to obtain an image sequence. And the terminal inputs the image sequence into a reconstruction network, and the reconstruction network performs feature extraction on each frame of image in the image sequence. And performing image convolution processing on the extracted features through a reconstruction network to obtain point cloud coordinates of different scales corresponding to each frame of image. And generating three-dimensional parameters respectively corresponding to the target object of each frame of image in the image sequence based on the point cloud coordinates of each frame of image respectively corresponding to different scales to obtain a three-dimensional parameter sequence corresponding to the target object.
Step 506, generating a three-dimensional virtual model sequence corresponding to the target object according to the three-dimensional parameter sequence; the three-dimensional virtual models in the sequence of three-dimensional virtual models have limb morphology matching the target object in the corresponding image.
Specifically, the terminal reconstructs to obtain each three-dimensional virtual model based on each three-dimensional parameter in the three-dimensional parameter sequence, and generates a three-dimensional virtual model sequence according to the sequencing order of each three-dimensional parameter in the three-dimensional parameter sequence. The three-dimensional virtual models in the three-dimensional virtual model sequence have limb shapes matched with the target objects in the corresponding images in the video.
Further, for each three-dimensional parameter in the three-dimensional parameter sequence, reconstructing each three-dimensional virtual model of the target object according to each three-dimensional parameter, thereby obtaining a three-dimensional virtual model sequence. Each three-dimensional virtual model in the sequence of three-dimensional virtual models has a limb morphology that matches a corresponding target object in the sequence of images.
In this embodiment, each three-dimensional virtual model is sequentially generated according to each three-dimensional parameter in the three-dimensional parameter sequence, so as to obtain a three-dimensional virtual model sequence corresponding to the three-dimensional parameter sequence.
In this embodiment, each frame of image including the target object in the video of the target object is obtained, and the three-dimensional parameters corresponding to the target object in each frame of image are output in real time through the reconstruction network, so that the three-dimensional virtual model of the target object in each frame of image is reconstructed in real time, a three-dimensional virtual model sequence is obtained, and the efficiency of reconstructing the three-dimensional virtual model is improved.
In one embodiment, as shown in fig. 6, after generating the three-dimensional parameter sequence corresponding to the target object in each frame image, the method further includes:
step 602, acquiring a corresponding time of each frame of image in the video to obtain a time sequence.
Specifically, the reconstruction network further includes a filtering layer. And the terminal acquires corresponding moments of each frame of image containing the target object in the video through a filtering layer in the reconstruction network, and sequences the moments according to the sequence of the moments to obtain a time sequence.
And step 604, filtering the three-dimensional parameter sequence according to the time sequence to obtain a filtered three-dimensional parameter sequence.
Specifically, the terminal performs filtering processing on the three-dimensional parameter sequence through the filtering layer based on the time sequence, so as to realize smoothing of inter-frame transition and obtain the filtered three-dimensional parameter sequence.
Further, the filtering layer takes the three-dimensional parameters in the three-dimensional parameter sequence as the current three-dimensional parameters in sequence, obtains the previous three-dimensional parameters of the current three-dimensional parameters, and obtains the time corresponding to the current three-dimensional parameters and the time corresponding to the previous three-dimensional parameters. And performing filtering processing on the current three-dimensional parameter based on the previous three-dimensional parameter, the time corresponding to the previous three-dimensional parameter and the time corresponding to the current three-dimensional parameter. And the current three-dimensional parameter and the previous three-dimensional parameter after the filtering treatment realize transition smoothing in time. And obtaining the filtered three-dimensional parameter sequence according to the same processing mode. The filtering process may be bilateral filtering, gaussian filtering, conditional filtering, straight-through filtering, or random sampling consistency filtering, but is not limited thereto.
In this embodiment, the filter layer in the reconstruction network may be a noise reduction self-encoder, which implements a process of down-sampling and up-sampling. The network structure of the filter layer is that the input of the training process of the RNN structure filter layer of seq2seq is a three-dimensional parameter time sequence added with noise, and the output is the three-dimensional parameter time sequence after denoising. And the trained filtering layer receives the three-dimensional parameter time sequence corresponding to the discontinuous images of the plurality of frames and predicts and outputs the continuous three-dimensional parameter time sequence.
The generating of the three-dimensional virtual model sequence corresponding to the target object according to the three-dimensional parameter sequence includes:
step 606, generating a three-dimensional virtual model sequence corresponding to the target object according to the filtered three-dimensional parameter sequence.
Specifically, the terminal reconstructs a three-dimensional virtual model based on the filtered three-dimensional parameter sequence, and generates a three-dimensional virtual model sequence according to the sequence of the three-dimensional parameters in the three-dimensional parameter sequence. Further, for each three-dimensional parameter in the filtered three-dimensional parameter sequence, a corresponding three-dimensional virtual model is obtained according to each three-dimensional parameter through reconstruction, and therefore a three-dimensional virtual model sequence is obtained.
In this embodiment, the corresponding time of each frame of image in the video is obtained to obtain a time sequence, and the three-dimensional parameter sequence is filtered according to the time sequence to obtain a filtered three-dimensional parameter sequence, so that inter-frame smoothing of the three-dimensional parameters can be realized according to time correlation. And reconstructing the three-dimensional virtual model of the target object in each frame of image based on the filtered three-dimensional parameter sequence, so that the reconstruction network can output continuous three-dimensional virtual models corresponding to the target object, and smooth transition between every two three-dimensional virtual models is realized, thereby improving the reconstruction precision and the continuity of the three-dimensional virtual models.
In the present embodiment, when the target object in the color image is not on the front side, for example, when the color image is a side image and a back image of the target object, the inter-frame three-dimensional parameter time smoothing process can smooth the unstable angle parameter. The output result of the current frame image and the output results of the previous and the next frames tend to be consistent, and the stability of human body reconstruction can be improved.
Fig. 7(a) is a frame diagram of a three-dimensional virtual model in one embodiment. The terminal obtains original images of frames in the video, wherein the frames comprise the target human body, and an original image sequence is obtained. And the terminal inputs the image sequence into a reconstruction network, the reconstruction network carries out human body detection on the image sequence, and a target human body in the image is marked through a detection frame. Human body detection can be realized through a lightweight network ResNet-18. Then, the reconstruction network preprocesses each original image in the marked image sequence, namely, cuts out an image with a preset size by taking the target human body as the center of each original image. Then, for each frame of the cut image, the feature extraction layer of the reconstruction network extracts the features of the cut image and outputs the extracted features to the image convolution layer. And carrying out graph convolution processing on the characteristic graph by the graph convolution layer to obtain point cloud coordinates with different scales. The reconstruction network performs downsampling and full-connection processing on point cloud coordinates of different scales so as to output a three-dimensional parameter sequence and camera parameters corresponding to a target human body in an image sequence through a full-connection layer. Next, the three-dimensional parameter sequence is input into the filter layer. And the reconstruction network acquires the corresponding moment of each frame of image in the video to obtain a time sequence. And a filtering layer in the reconstruction network carries out filtering processing on the three-dimensional parameter sequence based on the time sequence to obtain a filtered three-dimensional parameter sequence.
And then, the terminal reconstructs each three-dimensional virtual model corresponding to the target human body through the filtered three-dimensional parameter sequence to obtain a three-dimensional virtual model sequence. Then, the terminal can render the three-dimensional virtual model sequence into a two-dimensional human body sequence according to the camera parameters, and replace the target human body in each cut corresponding image by using the two-dimensional human body sequence. Then, the terminal can perform inverse transformation on the image obtained after the replacement to generate a target image sequence with the same size as the original image sequence, and replace the original image sequence in the video with the target image sequence. Further, the original image of the original image sequence in the video is replaced by the target image in the target image sequence.
Fig. 7(b) is a flowchart of reconstructing a three-dimensional virtual model corresponding to a target human body in a video in real time in an embodiment. And the terminal acquires original images of frames in the video, including the target human body, and sequentially inputs the original images into the reconstruction network. And the reconstruction network sequentially carries out three-dimensional virtual model reconstruction on the target human body in the original image. Specifically, the reconstruction network sequentially uses the input original images as current frame images, performs human body detection on the current frame original images, and marks out a target human body in the images through a detection frame. And then, cutting the scaled image of the labeled current frame original image by taking the target human body as the center. And (4) extracting the characteristics of the cut and scaled image through a reconstruction network and carrying out image convolution processing, and outputting an SMPL parameter and a camera parameter. Then, the reconstruction network performs time smoothing processing (i.e. filtering processing) on the SMPL parameter corresponding to the current frame original image through the time of the current frame original image in the video and the time of the previous frame original image in the video, so as to obtain the filtered SMPL parameter. And the terminal reconstructs the three-dimensional virtual model based on the SMPL parameters to obtain the three-dimensional virtual model corresponding to the target human body in the current frame image. Then, the terminal renders the three-dimensional virtual model into a two-dimensional human body based on the camera parameters, and replaces the target human body in the cut and scaled image. And then, the terminal performs reverse cropping scaling on the replaced image according to the cropping scaling information to obtain an image with the same size as the original image of the current frame. And according to the same processing mode, obtaining three-dimensional virtual models which are output in sequence, thereby obtaining reverse-cutting scaled images which are output in sequence. The reconstruction network is light in weight, single-frame time consumption in the whole reconstruction process is about 20 milliseconds, and real-time reconstruction of the three-dimensional virtual model can be realized.
In an embodiment, as shown in fig. 8, a training method for reconstructing a network is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
step 802, obtaining a first training image of a first object; the first object has a moving limb.
In particular, the terminal may acquire a first training image including a first object having a moving limb. For example the first object is a human or an animal. Further, the terminal may obtain the first training image by directly shooting the first object, or may obtain the first training image from a local or network or from a third device.
In this embodiment, the terminal may acquire any image, and screen out an image in which a human body or an animal exists in the any image, thereby obtaining the first training image.
In this example, the terminal may acquire a three-dimensional parameter corresponding to the target object in the first training image. The three-dimensional parameters include three-dimensional pose parameters and three-dimensional body type parameters. Then, the terminal may use the three-dimensional parameter as a three-dimensional label corresponding to the first training image, and use the three-dimensional pose parameter as a three-dimensional pose label. And the terminal can collect point cloud characteristics of the first training image and determine point cloud coordinates of different scales. And then, the terminal takes the point cloud coordinates with different scales as point cloud labels corresponding to the first training image.
In this embodiment, the terminal may acquire the three-dimensional parameters of the target object through the motion capture device. The terminal can set the three-dimensional parameters of the target object acquired by capturing as a label. Further, the terminal can set the corresponding three-dimensional parameters in the first training image as a first three-dimensional label, and set the three-dimensional parameters of the target object obtained by capturing and collecting as a second three-dimensional label.
And 804, extracting features of the first training image through a reconstruction network to be trained, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales.
Specifically, the reconstruction network to be trained includes a feature extraction layer and a graph convolution layer. And the terminal inputs the first training image into a reconstruction network to be trained. And the feature extraction layer of the reconstruction network to be trained performs feature extraction on the output first training image to obtain a corresponding feature map. And then, carrying out graph convolution processing on the characteristic graph by the graph convolution layer of the reconstruction network to be trained to obtain point cloud coordinates with different scales.
In this embodiment, performing feature extraction on the first training image through a reconstruction network to be trained, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales, including:
extracting the features of the first training image through a feature extraction layer of a reconstruction network to obtain a corresponding feature map; carrying out graph convolution processing on the characteristic graph through a graph convolution layer of the reconstruction network to obtain point cloud characteristics with different scales; and performing regression processing on the point cloud characteristics with different scales through the graph convolution layer to obtain point cloud coordinates with different scales.
Step 806, generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales.
Specifically, the terminal can perform downsampling and full-connection processing on point cloud coordinates of different scales to obtain a predicted three-dimensional parameter corresponding to the first object.
And 808, constructing a target loss function according to the point cloud coordinates and the predicted three-dimensional parameters of different scales.
Specifically, the terminal acquires point cloud labels capable of acquiring different scales, and determines the difference between point cloud coordinates of different scales and point cloud labels of corresponding scales. The terminal can obtain the three-dimensional label and determine the difference between the three-dimensional label and the predicted three-dimensional parameter. The terminal can construct a loss function according to the difference between the point cloud coordinates and the point cloud labels and the difference between the predicted three-dimensional parameters and the three-dimensional labels.
Step 810, training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in an image into a three-dimensional virtual model with limb shapes matched with the object.
Specifically, the terminal trains the reconstruction network to be trained based on the target loss function. And adjusting parameters of the reconstructed network in the training process and continuing training until the reconstructed network meets the training stopping condition, so as to obtain the trained reconstructed network. The trained reconstruction network is used for reconstructing the object with the movable limb in the image into a three-dimensional virtual model with the limb shape matched with the object.
In this embodiment, the training stop condition may be that a loss error of the reconstructed network is less than or equal to a loss threshold, or that the number of iterations of the reconstructed network reaches a preset number of iterations.
For example, the loss error generated in each training is calculated through the target loss function, the parameters of the reconstruction network are adjusted based on the difference between the loss error and the loss threshold value, and the training is continued until the training is stopped under the training stopping condition, so that the trained reconstruction network is obtained.
And the terminal calculates the iteration times of the reconstructed network in the training process, and stops training when the iteration times of the terminal in the training process reach the preset iteration times to obtain the trained reconstructed network.
In this embodiment, a first training image of a first object with a moving limb is obtained, feature extraction is performed on the first training image through a reconstruction network to be trained, graph convolution processing is performed on the extracted features, point cloud coordinates of different scales are obtained, and three-dimensional parameters corresponding to the object in the image can be accurately generated through graph convolution. And constructing a loss function by combining point cloud coordinates and three-dimensional parameters of different scales. The reconstruction network to be trained is trained based on the target loss function, loss caused by factors of all aspects to the network can be integrated in the training process, so that loss of all aspects is reduced to the minimum through training, the trained reconstruction network is higher in precision and higher in generalization capability, and the trained reconstruction network can predict three-dimensional parameters of the target object in the two-dimensional image more accurately. And accurately predicting the three-dimensional parameters of the target object in the two-dimensional image by using the trained reconstruction network, thereby accurately reconstructing the three-dimensional virtual model corresponding to the target object according to the three-dimensional parameters.
In one embodiment, as shown in fig. 9, constructing the target loss function according to the point cloud coordinates and the predicted three-dimensional parameters at different scales includes:
and 902, acquiring point cloud labels, and constructing a first loss function according to point cloud coordinates of different scales and the point cloud labels of corresponding scales.
The point cloud labels are point cloud coordinates with different scales corresponding to the preset first training image.
Specifically, the terminal obtains point cloud labels of different scales corresponding to the first training image, calculates L2 norms according to point cloud coordinates of different scales output by the reconstruction network and the point cloud labels of corresponding scales, and sums the L2 norms of different scales to obtain a first loss function.
For example, the first loss function constructed by the terminal is as follows:
Lgraph=∑i||fii)-di(M(θ,β))||2(1)
wherein phiiRepresenting a characteristic expression of the i-th layer, fii) Represents the point cloud coordinates of the i-th layer after the graph convolution processing, and M (theta, β) represents the point cloud label of the i-th layer di(M (θ, β) represents the point cloud label of the i-th layer that is down-sampled.
And 904, acquiring a first three-dimensional label corresponding to the first training image, and constructing a second loss function according to the predicted three-dimensional parameter and the first three-dimensional label.
The first three-dimensional label is a preset three-dimensional parameter corresponding to a first object in the first training image.
Specifically, the terminal obtains a first three-dimensional label corresponding to the first training image, determines an L2 norm between a predicted three-dimensional parameter output by the reconstruction network and the corresponding first three-dimensional label, and obtains a second loss function.
For example, the second loss function constructed by the terminal is as follows:
Figure BDA0002489212110000211
theta represents the three-dimensional attitude parameter predicted by the reconstructed network,
Figure BDA0002489212110000212
Representing three-dimensional attitude tags, i.e. true three-dimensional parameters β representing reconstructionThree-dimensional body type parameters predicted by network,
Figure BDA0002489212110000213
And (4) representing a three-dimensional body type label, namely a true body type parameter.
In this embodiment, supervised training for 3D joint points (i.e. three-dimensional pose parameters) requires global translational rotation. That is, the 3D joint point coordinates are originally coordinates in the world coordinate system, and the elimination of the global translation means that the world coordinates of the pelvic joint point are subtracted from the world coordinates of the 3D joint point to generate coordinates in the coordinate system centered on the pelvis. The training of the network is more stable, and the loss function is minimized by the aid of the Adam algorithm in the training optimizer until the reconstructed network is converged.
Step 906, constructing an objective loss function according to the first loss function and the second loss function.
Specifically, the terminal may obtain a weight corresponding to the first loss function, and a weight corresponding to the second loss function. And the terminal multiplies the first loss function by the corresponding weight, multiplies the second loss function by the corresponding weight, and sums up the multiplication results to obtain the target loss function.
In the embodiment, a first loss function is constructed according to point cloud coordinates of different scales and point cloud labels of corresponding scales, a first three-dimensional label corresponding to a first training image is obtained, a second loss function is constructed according to predicted three-dimensional parameters and the first three-dimensional label, a target loss function is constructed according to the first loss function and the second loss function, the constructed target loss function can be constructed on the basis of two aspects of point cloud coordinates of different scales and predicted three-dimensional parameters obtained through image prediction, the constructed target loss function is more accurate, and therefore a reconstructed network obtained through training is more accurate.
In one embodiment, the method further comprises: acquiring a second three-dimensional label, and constructing a third loss function according to the predicted three-dimensional parameter and the second three-dimensional label; the second three-dimensional label is a three-dimensional parameter acquired by capturing the motion of the first object;
constructing a target loss function from the first loss function and the second loss function, comprising: and constructing a target loss function according to the first loss function, the second loss function and the third loss function.
The second three-dimensional tag is a three-dimensional parameter acquired by motion capture of the first object through the motion capture device, and the three-dimensional parameter comprises a three-dimensional posture parameter and a three-dimensional body type parameter acquired by motion capture of the first object through the motion capture device.
Specifically, the reconstruction network in the training process further comprises a generation countermeasure layer, the terminal obtains a second three-dimensional label, and the second three-dimensional label and the predicted three-dimensional parameters are input to generate the countermeasure layer. And the generation countermeasure layer judges the input predicted three-dimensional parameters and the second three-dimensional label, and the judgment result is true or false. For example, generating the countermeasure network output 1 indicates that the predicted three-dimensional parameter is true, and output 0 indicates that the predicted three-dimensional parameter is false. It can also be set according to the requirement, and 1 is used to represent that the predicted three-dimensional parameter is false, and 0 is used to represent that the predicted three-dimensional parameter is true.
In this embodiment, the second three-dimensional label is true in the determination result of the generation of the countermeasure layer output. It can be understood that all the preset labels are true in generating the discrimination result output by the countermeasure layer.
And the terminal constructs a third loss function according to the judgment result corresponding to the predicted three-dimensional parameter output by the generated countermeasure network and the judgment result corresponding to the second three-dimensional label.
Further, the terminal calculates the negative logarithm of the discrimination result corresponding to the predicted three-dimensional parameter, and calculates the expectation of the negative logarithm. And the terminal calculates the negative logarithm of the difference value of the discrimination results corresponding to the 1 and the second three-dimensional label, and calculates the expectation of the negative logarithm. The terminal sums the two expectations to obtain a third loss function.
For example, the terminal constructs a third loss function as follows:
the generation of the countermeasure network takes the log loss:
Figure BDA0002489212110000221
wherein x isrRepresenting a second three-dimensional tag, D (x)r) To representAnd generating a discrimination result of the discriminator in the countermeasure network on the first three-dimensional label.
Figure BDA0002489212110000222
The expected value of the negative logarithm of the discriminator output. x is the number offRefers to the predicted three-dimensional parameter, D (x), of the reconstructed network outputf) And representing the discrimination result of the discriminator on the predicted three-dimensional parameter.
Next, the terminal may obtain a weight corresponding to the first loss function, a weight corresponding to the second loss function, and a weight corresponding to the third loss function. And the terminal multiplies the first loss function by the corresponding weight, multiplies the second loss function by the corresponding weight, multiplies the third loss function by the corresponding weight, and sums up the 3 multiplication results to obtain the target loss function.
In this embodiment, the network is reconstructed, and the generation countermeasure layer is further included to discriminate the predicted three-dimensional parameter and the second three-dimensional tag, so that a loss function can be constructed for the three-dimensional parameter obtained by the predicted image and the three-dimensional data directly acquired through the real human body, so as to determine whether the predicted three-dimensional parameter output by the network conforms to the real situation. And constructing a loss function based on the difference between the judgment result of the predicted three-dimensional parameter and the judgment result of the second three-dimensional label, and constructing three loss functions through three factors to obtain a target loss function. The target loss function integrates characteristics in multiple aspects, so that the accuracy of the reconstructed network obtained through training is higher.
In one embodiment, the method further comprises: generating camera parameters corresponding to the first training image based on point cloud coordinates of different scales; converting the three-dimensional attitude parameters into predicted two-dimensional attitude parameters according to the camera parameters; constructing a fourth loss function according to the predicted two-dimensional attitude parameters and the corresponding two-dimensional attitude tags;
constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters, wherein the method comprises the following steps: and constructing a target loss function according to the fourth loss function, the point cloud coordinates with different scales and the predicted three-dimensional parameters.
The three-dimensional posture parameter is a three-dimensional joint point coordinate corresponding to the first object.
Specifically, the reconstruction network generates a predicted three-dimensional parameter corresponding to a first object in a first training image based on point cloud coordinates of different scales, and generates a predicted camera parameter corresponding to the first training image. The predicted three-dimensional parameters comprise predicted three-dimensional attitude parameters. And then, the terminal can map the predicted three-dimensional attitude parameters from the three-dimensional space to the two-dimensional space according to the predicted camera parameters to obtain predicted two-dimensional attitude parameters corresponding to the predicted three-dimensional attitude parameters. The predicted two-dimensional pose parameters refer to predicted two-dimensional joint coordinates of the first object.
The terminal can determine an L2 norm between the predicted two-dimensional pose parameter and the corresponding two-dimensional pose label to obtain a fourth loss function.
For example, the fourth loss function constructed by the terminal is as follows:
the two-dimensional attitude loss is the loss of the joint point after being projected by the camera parameters and marked by a truth value:
Figure BDA0002489212110000231
wherein, ПcAnd the predicted three-dimensional attitude parameters output by the reconstruction network are represented by predicted two-dimensional attitude parameters obtained by projection of predicted camera parameters, namely predicted 2D joint points obtained by projection of predicted 3D joint points.
Figure BDA0002489212110000232
Is a two-dimensional posture label, namely a 2D joint point truth value.
And then, the terminal acquires point cloud labels, and a first loss function is constructed according to the point cloud coordinates of different scales and the point cloud labels of corresponding scales. And the terminal acquires a first three-dimensional label corresponding to the first training image, and constructs a second loss function according to the predicted three-dimensional parameter and the first three-dimensional label.
And constructing a target loss function according to the first loss function and the second loss function.
Next, the terminal may obtain a weight corresponding to the first loss function, a weight corresponding to the second loss function, and a weight corresponding to the fourth loss function. And the terminal multiplies the first loss function by the corresponding weight, multiplies the second loss function by the corresponding weight, multiplies the fourth loss function by the corresponding weight, and sums up the 3 multiplication results to obtain the target loss function.
In the embodiment, camera parameters corresponding to the first training image are generated based on point cloud coordinates of different scales; converting the three-dimensional attitude parameters into predicted two-dimensional attitude parameters according to the camera parameters; the method comprises the steps of constructing a fourth loss function according to predicted two-dimensional attitude parameters and corresponding two-dimensional attitude labels, constructing a target loss function according to the fourth loss function, point cloud coordinates of different scales and predicted three-dimensional parameters, and constructing the target loss function based on characteristics of the predicted two-dimensional attitude parameters, the point cloud coordinates of different scales and the predicted three-dimensional parameters of images, so that the constructed target loss function is more accurate, and a reconstructed network obtained by training is more accurate.
In one embodiment, the method further comprises: inputting a second training image into a reconstruction network to be trained to obtain a predicted three-dimensional attitude parameter of a second object in the second training image; the first training image and the second training image are images collected in different environments; acquiring a three-dimensional attitude tag corresponding to the second object, and constructing a fifth loss function according to the predicted three-dimensional attitude parameter and the three-dimensional attitude tag;
constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters, wherein the method comprises the following steps: and constructing a target loss function according to the fifth loss function, the point cloud coordinates with different scales and the predicted three-dimensional parameters.
The first training image is an image acquired outdoors, and the second training image is an image acquired indoors. The first object in the first training image and the second object in the second training image may be the same object or may be different objects.
Specifically, the terminal inputs the second training image into the reconstruction network to be trained, and performs feature extraction on the second training image through a feature extraction layer in the reconstruction network to be trained to obtain a feature map of the second training image. Carrying out graph convolution processing on the feature graph of the second training image through a graph convolution layer in the reconstruction network to be trained to obtain point cloud features of different scales; and carrying out graph convolution processing on the point cloud characteristics of different scales through the graph convolution layer to obtain point cloud coordinates of different scales. And the terminal generates a predicted three-dimensional attitude parameter corresponding to a second object in a second training image through the graph convolution layer based on point cloud coordinates of different scales.
Then, the terminal obtains a three-dimensional attitude tag corresponding to the second object, determines an L2 norm between the predicted three-dimensional attitude parameter and the three-dimensional attitude tag, and obtains a fifth loss function.
For example, the fifth loss function constructed by the terminal is as follows:
the fifth loss function is the loss between the predicted three-dimensional pose parameters and the three-dimensional pose labels output by the network:
three-dimensional pose loss is the loss of the joint points and truth labels of the SMPL model:
Figure BDA0002489212110000251
wherein R isθ(β) represents predicted three-dimensional posture parameters, i.e., 3D joint coordinates, among the predicted three-dimensional parameters output from the reconstruction network.
Figure BDA0002489212110000252
Representing three-dimensional pose tags, i.e. the real 3D joint coordinates.
And then, the terminal acquires point cloud labels, and a first loss function is constructed according to the point cloud coordinates of different scales and the point cloud labels of corresponding scales. And the terminal acquires a first three-dimensional label corresponding to the first training image, and constructs a second loss function according to the predicted three-dimensional parameter and the first three-dimensional label.
And constructing a target loss function according to the fifth loss function, the first loss function and the second loss function.
Next, the terminal may obtain a weight corresponding to the first loss function, a weight corresponding to the second loss function, and a weight corresponding to the fifth loss function. And the terminal multiplies the first loss function by the corresponding weight, multiplies the second loss function by the corresponding weight, multiplies the fifth loss function by the corresponding weight, and sums up the 3 multiplication results to obtain the target loss function.
In this embodiment, a loss function is constructed based on three-dimensional attitude parameters obtained by predicting a reconstructed network and corresponding three-dimensional attitude tags, and a target loss function is constructed by combining three factors in the three aspects of predicting the three-dimensional parameters, point cloud coordinates of different scales and predicting the three-dimensional attitude parameters, so that losses generated by multiple factors in a training process of the reconstructed network can be integrated, the influence of the factors on predicting the reconstructed network is ensured to be minimum, and the reconstructed three-dimensional virtual model of the reconstructed network is more accurate.
In one embodiment, the method further comprises: inputting the third training image into a reconstruction network to be trained to obtain a corresponding predicted two-dimensional attitude parameter; constructing a fifth loss function according to the predicted two-dimensional attitude parameters and the corresponding two-dimensional attitude tags;
constructing a target loss function based on point cloud coordinates of different scales and predicted three-dimensional parameters, wherein the method comprises the following steps: and constructing a target loss function based on the fifth loss function, the point cloud coordinates with different scales and the predicted three-dimensional parameters.
Specifically, the third training image may be an image with a complex background, and the third training image includes a third object.
In one embodiment, inputting the third training image into a reconstruction network to be trained to obtain a corresponding predicted two-dimensional pose parameter includes:
inputting the third training image into a reconstruction network to be trained, and performing feature extraction on the third training image through a feature extraction layer in the reconstruction network to be trained to obtain a feature map of the third training image; performing regression processing on the feature map of the third training image through a map convolution layer in the reconstruction network to be trained to obtain point cloud features of different scales; performing regression processing on the point cloud characteristics of different scales through the graph convolution layer to obtain point cloud coordinates of different scales; generating predicted three-dimensional attitude parameters and camera parameters based on point cloud coordinates of different scales through the graph convolution layer; and converting the predicted three-dimensional attitude parameters into corresponding predicted two-dimensional attitude parameters through the camera parameters.
In one embodiment, the terminal may construct the target loss function according to the first loss function, the second loss function, the third loss function, the fourth loss function, the fifth loss function, and the weight parameters corresponding to the loss functions.
The terminal constructs an objective loss function as follows:
Ltotal=λ1Lj2d2Lj3d3Lgraph4Lsmpl5Ladv(6)
wherein λ is a weight parameter corresponding to each loss function.
The target loss function is constructed through the 5 loss functions, factors influencing network performance in more aspects are integrated, influence in all aspects is reduced to the minimum, and the network reconstruction is more accurate.
In one embodiment, the training dataset used in the training process to reconstruct the network comprises a 3DPW human parameters dataset (3D spots in the world dataset); 2D attitude data sets disclosed by Posetrack, PennyAction and the like, 3D attitude data sets (indoor acquired data sets) disclosed by Human3.6M, MPI _ INF _3DHP and the like, and an SMPL parameter data set acquired by Mosh, wherein SMPL human body parameters comprise attitude parameters and body type parameters, the attitude parameters have 72 dimensions, and the body type parameters have 10 dimensions; the attitude parameter is rotation information of 24 joint points, and the rotation information of each joint point is represented by a 3-dimensional axis angle vector, so that 24 × 3-dimensional parameters are shared. SMPL is a human parameter based on the epidermal transformation, which can be converted into a posture represented by a 72-dimensional vector and a body shape represented by a 10-dimensional vector. Mosh is a Motion And Shape Capture means (Motion And Shape Capture) human Motion And body Shape Capture dataset, relying on Motion Capture devices to Capture data of human surface points, resulting in a SMPL parameter dataset.
Fig. 10 is a frame diagram of a three-dimensional virtual model reconstruction of a human body in a two-dimensional image according to an embodiment. The terminal acquires a two-dimensional image containing a human body and inputs the two-dimensional image into a reconstruction network. And a feature extraction layer in the reconstruction network performs feature extraction on the two-dimensional image, and performs camera parameter regression based on the extracted features to obtain camera parameters corresponding to the two-dimensional image. Inputting the extracted features into a graph convolution layer to perform graph convolution layer processing, and performing regression processing on the point cloud features of each scale to obtain point cloud coordinates of different scales. And performing human body parameter regression on the graph volume layer based on point cloud coordinates of different scales to obtain three-dimensional parameters corresponding to the human body in the two-dimensional image. And (3) carrying out re-projection on the three-dimensional joint points (namely the three-dimensional attitude parameters) in the three-dimensional parameters to obtain two-dimensional joint points (namely the two-dimensional attitude parameters). And inputting the three-dimensional parameters and the corresponding three-dimensional labels into a generated countermeasure layer, and judging the three-dimensional parameters by a discriminator in the generated countermeasure layer to obtain a judgment result, wherein the judgment result is true or false. And constructing a target loss function according to the difference between the three-dimensional parameters and the corresponding three-dimensional labels, the difference between the point cloud coordinates of different scales and the point cloud labels of corresponding scales, the difference between the three-dimensional joint points and the corresponding labels, and the difference between the two-dimensional joint points and the corresponding labels, and training a reconstruction network based on the target loss function.
In one embodiment, there is provided a three-dimensional virtual model reconstruction method, including:
the server obtains a first training image of a first object. The first object has a limb that is active.
And then, the server extracts the features of the first training image through a reconstruction network to be trained, and performs graph convolution processing on the extracted features to obtain point cloud coordinates of different scales.
Next, the server generates predicted three-dimensional parameters and corresponding predicted camera parameters of the first object based on the point cloud coordinates of different scales.
And then, the server acquires point cloud labels, and a first loss function is constructed according to the point cloud coordinates of different scales and the point cloud labels of corresponding scales.
Further, the server obtains a first three-dimensional label corresponding to the first training image, and a second loss function is constructed according to the predicted three-dimensional parameter and the first three-dimensional label.
And then, the server acquires the second three-dimensional label and constructs a third loss function according to the predicted three-dimensional parameter and the second three-dimensional label. The second three-dimensional tag is a three-dimensional parameter acquired by motion capture of the first object.
The server then converts the three-dimensional pose parameters to predicted two-dimensional pose parameters based on the camera parameters.
Further, the server constructs a fourth loss function according to the predicted two-dimensional attitude parameters and the corresponding two-dimensional attitude tags.
And then, the server inputs the second training image into a reconstruction network to be trained to obtain the predicted three-dimensional posture parameter of the second object in the second training image. The first training image and the second training image are images acquired in different environments.
And then, the server acquires a three-dimensional attitude tag corresponding to the second object, and constructs a fifth loss function according to the predicted three-dimensional attitude parameter and the three-dimensional attitude tag.
And the server constructs a target loss function according to the first loss function, the second loss function, the third loss function, the fourth loss function, the fifth loss function and the weight parameters corresponding to the loss functions.
Further, the server trains the reconstruction network to be trained based on the target loss function, and the trained reconstruction network is obtained when the training stopping condition is met. And the trained reconstruction network is used for reconstructing the object with the movable limb in the image into a three-dimensional virtual model with the matched limb shape with the object.
And applying the trained reconstruction network to a terminal, and acquiring images of each frame in the video of the target object, which contain the target object, by the terminal, wherein the target object has movable limbs.
And then, the terminal extracts the features of each frame of image through the trained feature extraction layer of the reconstruction network to obtain a feature map corresponding to each frame of image.
Further, the terminal carries out image convolution processing on the feature map of each frame of image through an image convolution layer of a reconstruction network to obtain point cloud features of different scales corresponding to each frame of image.
And then, the terminal carries out regression processing on the point cloud characteristics of different scales respectively corresponding to each frame of image through the image convolution layer to obtain point cloud coordinates of different scales respectively corresponding to each frame of image.
And then, the terminal generates three-dimensional parameters of the target object in each frame of image and corresponding camera parameters according to the point cloud coordinates of different scales corresponding to each frame of image.
Further, the terminal reconstructs a three-dimensional virtual model of the target object in each frame image based on the three-dimensional parameters of the target object, wherein the three-dimensional virtual model has a limb shape matched with the target object in the corresponding image.
Further, the terminal projects the three-dimensional virtual model into a two-dimensional image according to camera parameters corresponding to the image, and replaces a target object in an object image in the video according to the two-dimensional image to obtain a target video.
In this embodiment, the training process of reconstructing the network has a high requirement on the device graphics card, and can be completed on the server. For the three-dimensional parameter data set, supervised training can be performed by predicting three-dimensional attitude parameters and corresponding labels. In order to improve the generalization capability of the reconstruction network, a two-dimensional attitude data set and a three-dimensional attitude data set are added for semi-supervised training. In order to improve the quality of the output result of the reconstructed network, a structure factorization discriminator discriminates the data of the network prediction. In order to improve the accuracy of the network, multi-scale point cloud coordinate supervision is added into a training loss function. In order to improve the smoothness of the inter-frame result of the video image, the single-frame result is smoothed by time. In the training process, various factors are considered, so that the influence of various aspects is reduced to the minimum through training, and the accuracy of the reconstructed network is improved.
The two-dimensional image of the target object to be reconstructed is predicted through the trained reconstruction network, the three-dimensional parameters of the target object can be accurately obtained, and therefore the three-dimensional virtual model can be accurately constructed according to the three-dimensional parameters.
It should be understood that although the various steps in the flowcharts of fig. 2-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 11, there is provided a three-dimensional virtual model reconstruction apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: an image acquisition module 1102, a feature extraction module 1104, a generation module 1106, and a reconstruction module 1108, wherein:
an image acquisition module 1102 for acquiring an image of a target object, the target object having a moving limb.
And the feature extraction module 1104 is configured to perform feature extraction on the image, and perform graph convolution processing on the extracted features to obtain point cloud coordinates of different scales.
A generating module 1106, configured to generate three-dimensional parameters of the target object according to the point cloud coordinates of different scales.
A reconstruction module 1108 for reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
In this embodiment, an image of a target object having a moving limb is obtained, feature extraction is performed on the image, graph convolution processing is performed on the extracted features, point cloud coordinates of different scales are obtained, three-dimensional parameters of the target object are generated according to the point cloud coordinates of different scales, and the three-dimensional parameters of the target object in the image can be accurately generated through graph convolution. And reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object, wherein the three-dimensional virtual model has a limb shape matched with the target object in the image, so that the reconstruction accuracy of the three-dimensional virtual model is improved.
In one embodiment, the feature extraction module 1104 is configured to: extracting the features of the image through a feature extraction layer of a reconstruction network to obtain a corresponding feature map; carrying out graph convolution processing on the characteristic graph through a graph convolution layer of a reconstruction network to obtain point cloud characteristics with different scales; and performing regression processing on the point cloud characteristics of different scales through the graph convolution layer to obtain point cloud coordinates of different scales.
In this embodiment, the features of the image are extracted through the trained reconstruction network to obtain the key information of the image. And carrying out graph convolution processing on the extracted key information through a reconstruction network, converting the key information into point cloud features with different scales, carrying out regression processing on the point cloud features with different scales through the graph convolution layer to obtain point cloud coordinates with different scales so as to accurately output the coordinates of the key feature information of each scale.
In one embodiment, the apparatus further comprises: a projection module to: determining camera parameters corresponding to the image based on point cloud coordinates of different scales; and projecting the three-dimensional virtual model into a two-dimensional image according to the camera parameters corresponding to the image.
In the embodiment, the camera parameters corresponding to the image are determined based on the point cloud coordinates of different scales, and the three-dimensional virtual model is projected into the two-dimensional image according to the camera parameters corresponding to the image, so that the display mode is more attractive and visual. Moreover, the degree of coincidence between the two-dimensional image projected by the three-dimensional virtual model and the image of the target object can be visually displayed, and the three-dimensional virtual model can be visualized.
In one embodiment, the image acquisition module 1102 is further configured to: acquiring an image containing a target object in a video of the target object;
the device also includes: a projection module, the projection module further configured to: projecting the three-dimensional virtual model into the image to replace the target object in the image; and generating a target video based on the image of each frame in the video after replacing the target object.
In this embodiment, each frame in the video of the target object includes an image of the target object, and the three-dimensional parameter corresponding to the target object in each frame of the image is output in real time through the reconstruction network. And reconstructing based on the three-dimensional parameters to obtain a three-dimensional virtual model corresponding to the target object in each frame of image, projecting the three-dimensional virtual model corresponding to each image into the corresponding image to replace the target object in the corresponding image to obtain a target video, so that the three-dimensional virtual model can be projected into the application of a human-computer interaction somatosensory game or a short video, and the reality of a human-computer interaction in the somatosensory game or a three-dimensional virtual reality special effect in the short video is enhanced.
In one embodiment, the image acquisition module 1102 is further configured to: acquiring each frame of image containing a target object in a video of the target object;
the generation module 1106 is further configured to: generating a three-dimensional parameter sequence based on the three-dimensional parameters of the target object in each frame of image;
the reconstruction module 1108 is further configured to: and generating a three-dimensional virtual model sequence corresponding to the target object according to the three-dimensional parameter sequence.
In this embodiment, each frame in the video of the target object includes an image of the target object, and a three-dimensional parameter sequence corresponding to the target object in each frame image is generated according to point cloud coordinates of different scales corresponding to each frame image, so that three-dimensional parameters corresponding to the target object in each frame image can be output in real time through a reconstruction network, a three-dimensional virtual model of the target object in each frame image is reconstructed in real time, and the efficiency of reconstructing the three-dimensional virtual model is improved.
In one embodiment, the generation module 1106 is further configured to: acquiring corresponding moments of each frame of image in a video to obtain a time sequence; filtering the three-dimensional parameter sequence according to the time sequence to obtain a filtered three-dimensional parameter sequence;
the reconstruction module 1108 is further configured to: generating a three-dimensional virtual model sequence corresponding to the target object according to the filtered three-dimensional parameter sequence; the three-dimensional virtual models in the sequence of three-dimensional virtual models have limb morphology matching the target object in the corresponding image.
In this embodiment, the corresponding time of each frame of image in the video is obtained to obtain a time sequence, and the three-dimensional parameter sequence is filtered according to the time sequence to obtain a filtered three-dimensional parameter sequence, so that inter-frame smoothing of the three-dimensional parameters can be realized according to time correlation. And reconstructing the three-dimensional virtual model of the target object in each frame of image based on the filtered three-dimensional parameter sequence, so that the reconstruction network can output continuous three-dimensional virtual models corresponding to the target object, and smooth transition between every two three-dimensional virtual models is realized, thereby improving the reconstruction precision and the continuity of the three-dimensional virtual models.
For specific limitations of the three-dimensional virtual model reconstruction device, reference may be made to the above limitations of the three-dimensional virtual model reconstruction method, which are not described herein again. The modules in the three-dimensional virtual model reconstruction device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, as shown in fig. 12, there is provided a training apparatus for reconstructing a network, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: a training image acquisition module 1202, an input module 1204, a prediction module 1206, a construction module 1208, and a training module 1210, wherein:
a training image acquisition module 1202 for acquiring a first training image of a first object; the first object has a moving limb.
An input module 1204, configured to perform feature extraction on the first training image through a reconstruction network to be trained, and perform graph convolution processing on the extracted features to obtain point cloud coordinates of different scales.
A prediction module 1206 for generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of the different scales.
A constructing module 1208, configured to construct a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameter.
A training module 1210, configured to train the reconstructed network to be trained based on the target loss function, and obtain a trained reconstructed network when a training stop condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with limb shapes matched with the object.
In this embodiment, a first training image of a first object with a moving limb is obtained, feature extraction is performed on the first training image through a reconstruction network to be trained, graph convolution processing is performed on the extracted features, point cloud coordinates of different scales are obtained, and three-dimensional parameters corresponding to the object in the image can be accurately generated through graph convolution. And constructing a loss function by combining point cloud coordinates and three-dimensional parameters of different scales. The reconstruction network to be trained is trained based on the target loss function, loss caused by factors of all aspects to the network can be integrated in the training process, so that loss of all aspects is reduced to the minimum through training, the trained reconstruction network is higher in precision and higher in generalization capability, and the trained reconstruction network can predict three-dimensional parameters of the target object in the two-dimensional image more accurately. And accurately predicting the three-dimensional parameters of the target object in the two-dimensional image by using the trained reconstruction network, thereby accurately reconstructing the three-dimensional virtual model corresponding to the target object according to the three-dimensional parameters.
In one embodiment, the building module 1208 is further configured to: acquiring point cloud labels, and constructing a first loss function according to point cloud coordinates of different scales and the point cloud labels of corresponding scales; acquiring a first three-dimensional label corresponding to the first training image, and constructing a second loss function according to the predicted three-dimensional parameter and the first three-dimensional label; and constructing a target loss function according to the first loss function and the second loss function.
In the embodiment, a first loss function is constructed according to point cloud coordinates of different scales and point cloud labels of corresponding scales, a first three-dimensional label corresponding to a first training image is obtained, a second loss function is constructed according to predicted three-dimensional parameters and the first three-dimensional label, a target loss function is constructed according to the first loss function and the second loss function, the constructed target loss function can be constructed on the basis of two aspects of point cloud coordinates of different scales and predicted three-dimensional parameters obtained through image prediction, the constructed target loss function is more accurate, and therefore a reconstructed network obtained through training is more accurate.
In one embodiment, the building module 1208 is further configured to: acquiring a second three-dimensional label, and constructing a third loss function according to the predicted three-dimensional parameter and the second three-dimensional label; the second three-dimensional label is a three-dimensional parameter acquired by capturing the motion of the first object; and constructing a target loss function according to the first loss function, the second loss function and the third loss function.
In this embodiment, reconstructing the network further includes generating a countermeasure layer to discriminate the predicted three-dimensional parameter and the second three-dimensional label, so as to determine whether the predicted three-dimensional parameter output by the network conforms to a real situation. And constructing a loss function based on the difference between the judgment result of the predicted three-dimensional parameter and the judgment result of the second three-dimensional label, and constructing three loss functions through three factors to obtain a target loss function. The target loss function integrates characteristics in multiple aspects, so that the accuracy of the reconstructed network obtained through training is higher.
In one embodiment, the building module 1208 is further configured to: generating camera parameters corresponding to the first training image based on point cloud coordinates of different scales; converting the three-dimensional attitude parameters into predicted two-dimensional attitude parameters through the camera parameters; constructing a fourth loss function according to the predicted two-dimensional attitude parameters and the corresponding two-dimensional attitude tags; and constructing a target loss function according to the fourth loss function, the point cloud coordinates with different scales and the predicted three-dimensional parameters.
In the embodiment, camera parameters corresponding to the first training image are generated based on point cloud coordinates of different scales; converting the three-dimensional attitude parameters into predicted two-dimensional attitude parameters according to the camera parameters; the method comprises the steps of constructing a fourth loss function according to predicted two-dimensional attitude parameters and corresponding two-dimensional attitude labels, constructing a target loss function according to the fourth loss function, point cloud coordinates of different scales and predicted three-dimensional parameters, and constructing the target loss function based on characteristics of the predicted two-dimensional attitude parameters, the point cloud coordinates of different scales and the predicted three-dimensional parameters of images, so that the constructed target loss function is more accurate, and a reconstructed network obtained by training is more accurate.
In one embodiment, the building module 1208 is further configured to: inputting a second training image into a reconstruction network to be trained to obtain a predicted three-dimensional attitude parameter of a second object in the second training image; the first training image and the second training image are images collected in different environments; acquiring a three-dimensional attitude tag corresponding to the second object, and constructing a fifth loss function according to the predicted three-dimensional attitude parameter and the three-dimensional attitude tag; and constructing a target loss function according to the fifth loss function, the point cloud coordinates with different scales and the predicted three-dimensional parameters.
In this embodiment, a loss function is constructed based on three-dimensional attitude parameters obtained by predicting a reconstructed network and corresponding three-dimensional attitude tags, and a target loss function is constructed by combining three factors in the three aspects of predicting the three-dimensional parameters, point cloud coordinates of different scales and predicting the three-dimensional attitude parameters, so that losses generated by multiple factors in a training process of the reconstructed network can be integrated, the influence of the factors on predicting the reconstructed network is ensured to be minimum, and the reconstructed three-dimensional virtual model of the reconstructed network is more accurate.
For specific limitations of the training apparatus for reconstructing the network, reference may be made to the above limitations of the training method for reconstructing the network, which are not described herein again. The modules in the training apparatus for reconstructing a network may be implemented in whole or in part by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing training data of a reconstruction network and reconstruction data of a three-dimensional virtual model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method for reconstructing a network and a three-dimensional virtual model reconstruction method.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a training method for reconstructing a network and a three-dimensional virtual model reconstruction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method of reconstructing a three-dimensional virtual model, the method comprising:
acquiring an image of a target object, the target object having a moving limb;
extracting the features of the image, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales;
reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
2. The method of claim 1, wherein the extracting features from the image and performing a graph convolution process on the extracted features to obtain point cloud coordinates of different scales comprises:
extracting the features of the image through a feature extraction layer of a reconstruction network to obtain a corresponding feature map;
carrying out graph convolution processing on the characteristic graph through a graph convolution layer of the reconstruction network to obtain point cloud characteristics with different scales;
and performing regression processing on the point cloud characteristics with different scales through the graph convolution layer to obtain point cloud coordinates with different scales.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional virtual model into a two-dimensional image according to the camera parameters corresponding to the image.
4. The method of claim 1, wherein the acquiring an image of a target object comprises: acquiring an image containing a target object in a video of the target object;
the method further comprises the following steps:
projecting the three-dimensional virtual model into the image to replace the target object in the image;
and generating a target video based on the image of each frame in the video after replacing the target object.
5. The method of claim 1, wherein the acquiring an image of a target object comprises:
acquiring each frame of image containing a target object in a video of the target object;
reconstructing, by the computing device, a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology matching the target object in the image, including:
generating a three-dimensional parameter sequence based on the three-dimensional parameters of the target object in each frame of image;
generating a three-dimensional virtual model sequence corresponding to the target object according to the three-dimensional parameter sequence; the three-dimensional virtual models in the three-dimensional virtual model sequence have limb morphology matched with the target object in the corresponding image.
6. The method of claim 5, wherein after the generating of the three-dimensional parameter sequence corresponding to the target object in each frame of the image, the method further comprises:
acquiring the corresponding time of each frame of the image in the video to obtain a time sequence;
filtering the three-dimensional parameter sequence according to the time sequence to obtain a filtered three-dimensional parameter sequence;
generating a three-dimensional virtual model sequence corresponding to the target object according to the three-dimensional parameter sequence, wherein the three-dimensional virtual model sequence comprises the following steps:
and generating a three-dimensional virtual model sequence corresponding to the target object according to the filtered three-dimensional parameter sequence.
7. A training method for reconstructing a network, the method comprising:
acquiring a first training image of a first object; the first object has a moving limb;
extracting features of the first training image through a reconstruction network to be trained, and performing graph convolution processing on the extracted features to obtain point cloud coordinates of different scales;
generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales;
constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters;
training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with matched limb forms with the object.
8. The method of claim 7, wherein constructing an objective loss function from the point cloud coordinates of different scales and the predicted three-dimensional parameters comprises:
acquiring point cloud labels, and constructing a first loss function according to the point cloud coordinates of different scales and the point cloud labels of corresponding scales;
acquiring a first three-dimensional label corresponding to the first training image, and constructing a second loss function according to the predicted three-dimensional parameter and the first three-dimensional label;
and constructing a target loss function according to the first loss function and the second loss function.
9. The method of claim 8, further comprising:
acquiring a second three-dimensional label, and constructing a third loss function according to the predicted three-dimensional parameter and the second three-dimensional label; the second three-dimensional label is a three-dimensional parameter acquired by performing motion capture on the first object;
the constructing a target loss function according to the first loss function and the second loss function includes:
and constructing a target loss function according to the first loss function, the second loss function and the third loss function.
10. The method of claim 7, further comprising:
generating camera parameters corresponding to the first training image based on the point cloud coordinates of different scales;
converting the three-dimensional attitude parameters into predicted two-dimensional attitude parameters according to the camera parameters;
constructing a fourth loss function according to the predicted two-dimensional attitude parameters and the corresponding two-dimensional attitude tags;
the constructing of the target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters comprises the following steps:
and constructing a target loss function according to the fourth loss function, the point cloud coordinates with different scales and the predicted three-dimensional parameters.
11. The method of claim 7, further comprising:
inputting a second training image into the reconstruction network to be trained to obtain a predicted three-dimensional posture parameter of a second object in the second training image; the first training image and the second training image are images acquired in different environments;
acquiring a three-dimensional attitude tag corresponding to the second object, and constructing a fifth loss function according to the predicted three-dimensional attitude parameter and the three-dimensional attitude tag;
the constructing of the target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters comprises the following steps:
and constructing a target loss function according to the fifth loss function, the point cloud coordinates of different scales and the predicted three-dimensional parameters.
12. An apparatus for reconstructing a three-dimensional virtual model, the apparatus comprising:
an image acquisition module for acquiring an image of a target object, the target object having a moving limb;
the characteristic extraction module is used for extracting the characteristics of the image and performing graph convolution processing on the extracted characteristics to obtain point cloud coordinates with different scales;
the generating module is used for generating three-dimensional parameters of the target object according to the point cloud coordinates of different scales;
a reconstruction module for reconstructing a three-dimensional virtual model of the target object based on the three-dimensional parameters of the target object; the three-dimensional virtual model has a limb morphology that matches the target object in the image.
13. A training apparatus for reconstructing a network, the apparatus comprising:
a training image acquisition module for acquiring a first training image of a first object; the first object has a moving limb;
the input module is used for extracting the characteristics of the first training image through a reconstruction network to be trained and carrying out graph convolution processing on the extracted characteristics to obtain point cloud coordinates with different scales;
a prediction module for generating a predicted three-dimensional parameter of the first object based on the point cloud coordinates of different scales;
the construction module is used for constructing a target loss function according to the point cloud coordinates of different scales and the predicted three-dimensional parameters;
the training module is used for training the reconstruction network to be trained based on the target loss function, and obtaining the trained reconstruction network when the training stopping condition is met; the trained reconstruction network is used for reconstructing an object with movable limbs in the image into a three-dimensional virtual model with matched limb forms with the object.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 11 when executing the computer program.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.
CN202010400447.3A 2020-05-13 2020-05-13 Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium Active CN111598998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010400447.3A CN111598998B (en) 2020-05-13 2020-05-13 Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010400447.3A CN111598998B (en) 2020-05-13 2020-05-13 Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111598998A true CN111598998A (en) 2020-08-28
CN111598998B CN111598998B (en) 2023-11-07

Family

ID=72191269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010400447.3A Active CN111598998B (en) 2020-05-13 2020-05-13 Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111598998B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815768A (en) * 2020-09-14 2020-10-23 腾讯科技(深圳)有限公司 Three-dimensional face reconstruction method and device
CN112150608A (en) * 2020-09-07 2020-12-29 鹏城实验室 Three-dimensional face reconstruction method based on graph convolution neural network
CN112233223A (en) * 2020-09-29 2021-01-15 深圳市易尚展示股份有限公司 Automatic human body parametric model deformation method and device based on three-dimensional point cloud
CN112287820A (en) * 2020-10-28 2021-01-29 广州虎牙科技有限公司 Face detection neural network, face detection neural network training method, face detection method and storage medium
CN112365589A (en) * 2020-12-01 2021-02-12 东方梦幻虚拟现实科技有限公司 Virtual three-dimensional scene display method, device and system
CN112581597A (en) * 2020-12-04 2021-03-30 上海眼控科技股份有限公司 Three-dimensional reconstruction method and device, computer equipment and storage medium
CN112598790A (en) * 2021-01-08 2021-04-02 中国科学院深圳先进技术研究院 Brain structure three-dimensional reconstruction method and device and terminal equipment
CN112967397A (en) * 2021-02-05 2021-06-15 北京奇艺世纪科技有限公司 Three-dimensional limb modeling method and device, virtual reality equipment and augmented reality equipment
CN113079136A (en) * 2021-03-22 2021-07-06 广州虎牙科技有限公司 Motion capture method, motion capture device, electronic equipment and computer-readable storage medium
CN113628322A (en) * 2021-07-26 2021-11-09 阿里巴巴(中国)有限公司 Image processing method, AR display live broadcast method, AR display equipment, AR display live broadcast equipment and storage medium
CN113706699A (en) * 2021-10-27 2021-11-26 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113763532A (en) * 2021-04-19 2021-12-07 腾讯科技(深圳)有限公司 Human-computer interaction method, device, equipment and medium based on three-dimensional virtual object
CN113952731A (en) * 2021-12-21 2022-01-21 广州优刻谷科技有限公司 Motion sensing game action recognition method and system based on multi-stage joint training
CN114170379A (en) * 2021-11-30 2022-03-11 聚好看科技股份有限公司 Three-dimensional model reconstruction method, device and equipment
WO2022142702A1 (en) * 2020-12-31 2022-07-07 北京达佳互联信息技术有限公司 Video image processing method and apparatus
WO2022147783A1 (en) * 2021-01-08 2022-07-14 中国科学院深圳先进技术研究院 Three-dimensional reconstruction method and apparatus for brain structure, and terminal device
CN114881846A (en) * 2022-05-30 2022-08-09 北京奇艺世纪科技有限公司 Virtual trial assembly system, method, device and computer readable medium
CN114913287A (en) * 2022-04-07 2022-08-16 北京拙河科技有限公司 Three-dimensional human body model reconstruction method and system
WO2023133675A1 (en) * 2022-01-11 2023-07-20 深圳先进技术研究院 Method and apparatus for reconstructing 3d image on the basis of 2d image, device, and storage medium
CN116991298A (en) * 2023-09-27 2023-11-03 子亥科技(成都)有限公司 Virtual lens control method based on antagonistic neural network
CN117132645A (en) * 2023-09-12 2023-11-28 深圳市木愚科技有限公司 Virtual digital person driving method, device, computer equipment and storage medium
WO2024031882A1 (en) * 2022-08-08 2024-02-15 珠海普罗米修斯视觉技术有限公司 Video processing method and apparatus, and computer readable storage medium
CN117726746A (en) * 2023-09-23 2024-03-19 书行科技(北京)有限公司 Three-dimensional human body reconstruction method, device, equipment, storage medium and program product
US20240202871A1 (en) * 2021-08-26 2024-06-20 Shanghai Jiao Tong University Three-dimensional point cloud upsampling method, system and device, and medium
CN118542728A (en) * 2024-07-29 2024-08-27 天津市鹰泰利安康医疗科技有限责任公司 Method and system for irreversible electroporation ablation in vessel

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017131672A1 (en) * 2016-01-27 2017-08-03 Hewlett Packard Enterprise Development Lp Generating pose frontalized images of objects
US20180322623A1 (en) * 2017-05-08 2018-11-08 Aquifi, Inc. Systems and methods for inspection and defect detection using 3-d scanning
US20190147245A1 (en) * 2017-11-14 2019-05-16 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals
CN110798718A (en) * 2019-09-02 2020-02-14 腾讯科技(深圳)有限公司 Video recommendation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017131672A1 (en) * 2016-01-27 2017-08-03 Hewlett Packard Enterprise Development Lp Generating pose frontalized images of objects
US20180322623A1 (en) * 2017-05-08 2018-11-08 Aquifi, Inc. Systems and methods for inspection and defect detection using 3-d scanning
US20190147245A1 (en) * 2017-11-14 2019-05-16 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals
CN110798718A (en) * 2019-09-02 2020-02-14 腾讯科技(深圳)有限公司 Video recommendation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NANYANG WANG等: "Pixel2Mesh:Generating 3D Mesh Models from Single RGB Images", HTTPS://ARXIV.ORG/PDF/1804.01654.PDF, pages 1 - 16 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150608A (en) * 2020-09-07 2020-12-29 鹏城实验室 Three-dimensional face reconstruction method based on graph convolution neural network
CN112150608B (en) * 2020-09-07 2024-07-23 鹏城实验室 Three-dimensional face reconstruction method based on graph convolution neural network
CN111815768A (en) * 2020-09-14 2020-10-23 腾讯科技(深圳)有限公司 Three-dimensional face reconstruction method and device
CN111815768B (en) * 2020-09-14 2020-12-18 腾讯科技(深圳)有限公司 Three-dimensional face reconstruction method and device
CN112233223A (en) * 2020-09-29 2021-01-15 深圳市易尚展示股份有限公司 Automatic human body parametric model deformation method and device based on three-dimensional point cloud
CN112287820A (en) * 2020-10-28 2021-01-29 广州虎牙科技有限公司 Face detection neural network, face detection neural network training method, face detection method and storage medium
CN112365589A (en) * 2020-12-01 2021-02-12 东方梦幻虚拟现实科技有限公司 Virtual three-dimensional scene display method, device and system
CN112365589B (en) * 2020-12-01 2024-04-26 东方梦幻虚拟现实科技有限公司 Virtual three-dimensional scene display method, device and system
CN112581597A (en) * 2020-12-04 2021-03-30 上海眼控科技股份有限公司 Three-dimensional reconstruction method and device, computer equipment and storage medium
WO2022142702A1 (en) * 2020-12-31 2022-07-07 北京达佳互联信息技术有限公司 Video image processing method and apparatus
CN112598790A (en) * 2021-01-08 2021-04-02 中国科学院深圳先进技术研究院 Brain structure three-dimensional reconstruction method and device and terminal equipment
CN112598790B (en) * 2021-01-08 2024-07-05 中国科学院深圳先进技术研究院 Brain structure three-dimensional reconstruction method and device and terminal equipment
WO2022147783A1 (en) * 2021-01-08 2022-07-14 中国科学院深圳先进技术研究院 Three-dimensional reconstruction method and apparatus for brain structure, and terminal device
CN112967397A (en) * 2021-02-05 2021-06-15 北京奇艺世纪科技有限公司 Three-dimensional limb modeling method and device, virtual reality equipment and augmented reality equipment
CN113079136A (en) * 2021-03-22 2021-07-06 广州虎牙科技有限公司 Motion capture method, motion capture device, electronic equipment and computer-readable storage medium
CN113079136B (en) * 2021-03-22 2022-11-15 广州虎牙科技有限公司 Motion capture method, motion capture device, electronic equipment and computer-readable storage medium
CN113763532A (en) * 2021-04-19 2021-12-07 腾讯科技(深圳)有限公司 Human-computer interaction method, device, equipment and medium based on three-dimensional virtual object
CN113763532B (en) * 2021-04-19 2024-01-19 腾讯科技(深圳)有限公司 Man-machine interaction method, device, equipment and medium based on three-dimensional virtual object
CN113628322B (en) * 2021-07-26 2023-12-05 阿里巴巴(中国)有限公司 Image processing, AR display and live broadcast method, device and storage medium
CN113628322A (en) * 2021-07-26 2021-11-09 阿里巴巴(中国)有限公司 Image processing method, AR display live broadcast method, AR display equipment, AR display live broadcast equipment and storage medium
US20240202871A1 (en) * 2021-08-26 2024-06-20 Shanghai Jiao Tong University Three-dimensional point cloud upsampling method, system and device, and medium
CN113706699A (en) * 2021-10-27 2021-11-26 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN114170379A (en) * 2021-11-30 2022-03-11 聚好看科技股份有限公司 Three-dimensional model reconstruction method, device and equipment
CN113952731A (en) * 2021-12-21 2022-01-21 广州优刻谷科技有限公司 Motion sensing game action recognition method and system based on multi-stage joint training
WO2023133675A1 (en) * 2022-01-11 2023-07-20 深圳先进技术研究院 Method and apparatus for reconstructing 3d image on the basis of 2d image, device, and storage medium
CN114913287A (en) * 2022-04-07 2022-08-16 北京拙河科技有限公司 Three-dimensional human body model reconstruction method and system
CN114913287B (en) * 2022-04-07 2023-08-22 北京拙河科技有限公司 Three-dimensional human body model reconstruction method and system
CN114881846A (en) * 2022-05-30 2022-08-09 北京奇艺世纪科技有限公司 Virtual trial assembly system, method, device and computer readable medium
WO2024031882A1 (en) * 2022-08-08 2024-02-15 珠海普罗米修斯视觉技术有限公司 Video processing method and apparatus, and computer readable storage medium
CN117132645A (en) * 2023-09-12 2023-11-28 深圳市木愚科技有限公司 Virtual digital person driving method, device, computer equipment and storage medium
CN117132645B (en) * 2023-09-12 2024-10-11 深圳市木愚科技有限公司 Virtual digital person driving method, device, computer equipment and storage medium
CN117726746A (en) * 2023-09-23 2024-03-19 书行科技(北京)有限公司 Three-dimensional human body reconstruction method, device, equipment, storage medium and program product
CN116991298A (en) * 2023-09-27 2023-11-03 子亥科技(成都)有限公司 Virtual lens control method based on antagonistic neural network
CN116991298B (en) * 2023-09-27 2023-11-28 子亥科技(成都)有限公司 Virtual lens control method based on antagonistic neural network
CN118542728A (en) * 2024-07-29 2024-08-27 天津市鹰泰利安康医疗科技有限责任公司 Method and system for irreversible electroporation ablation in vessel

Also Published As

Publication number Publication date
CN111598998B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN111598998B (en) Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium
CN109859296B (en) Training method of SMPL parameter prediction model, server and storage medium
US10679046B1 (en) Machine learning systems and methods of estimating body shape from images
CN109285215B (en) Human body three-dimensional model reconstruction method and device and storage medium
WO2022001236A1 (en) Three-dimensional model generation method and apparatus, and computer device and storage medium
JP7526412B2 (en) Method for training a parameter estimation model, apparatus for training a parameter estimation model, device and storage medium
CN113496507B (en) Human body three-dimensional model reconstruction method
US11508107B2 (en) Additional developments to the automatic rig creation process
CN109684969B (en) Gaze position estimation method, computer device, and storage medium
WO2022143645A1 (en) Three-dimensional face reconstruction method and apparatus, device, and storage medium
CN111488865A (en) Image optimization method and device, computer storage medium and electronic equipment
US11928778B2 (en) Method for human body model reconstruction and reconstruction system
CN109685873B (en) Face reconstruction method, device, equipment and storage medium
CN114339409B (en) Video processing method, device, computer equipment and storage medium
CN113570684A (en) Image processing method, image processing device, computer equipment and storage medium
CN113593001A (en) Target object three-dimensional reconstruction method and device, computer equipment and storage medium
CN113808277B (en) Image processing method and related device
CN111862278A (en) Animation obtaining method and device, electronic equipment and storage medium
CN116563493A (en) Model training method based on three-dimensional reconstruction, three-dimensional reconstruction method and device
CN111582120A (en) Method and terminal device for capturing eyeball activity characteristics
CN111275610A (en) Method and system for processing face aging image
CN114913287B (en) Three-dimensional human body model reconstruction method and system
US20220180548A1 (en) Method and apparatus with object pose estimation
CN115880766A (en) Method and device for training posture migration and posture migration models and storage medium
CN118553001A (en) Texture-controllable three-dimensional fine face reconstruction method and device based on sketch input

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027305

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant