WO2022208440A1 - Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture - Google Patents
Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture Download PDFInfo
- Publication number
- WO2022208440A1 WO2022208440A1 PCT/IB2022/053034 IB2022053034W WO2022208440A1 WO 2022208440 A1 WO2022208440 A1 WO 2022208440A1 IB 2022053034 W IB2022053034 W IB 2022053034W WO 2022208440 A1 WO2022208440 A1 WO 2022208440A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- human
- images
- neural network
- image
- mesh
- Prior art date
Links
- 230000001537 neural effect Effects 0.000 title claims abstract description 25
- 230000008921 facial expression Effects 0.000 title claims description 15
- 238000006073 displacement reaction Methods 0.000 claims abstract description 36
- 239000013598 vector Substances 0.000 claims abstract description 20
- 238000011084 recovery Methods 0.000 claims abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 15
- 239000000284 extract Substances 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000015654 memory Effects 0.000 claims description 6
- 230000037237 body shape Effects 0.000 abstract description 6
- 238000009877 rendering Methods 0.000 abstract description 3
- 238000012549 training Methods 0.000 description 17
- 238000005457 optimization Methods 0.000 description 9
- 239000003086 colorant Substances 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
- G06T17/205—Re-meshing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
- G06V10/422—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/467—Encoded features or binary features, e.g. local binary patterns [LBP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/16—Cloth
Definitions
- the present invention relates to three dimensional computer vision and graphics for the entertainment industry. More specifically, the present invention relates to acquiring and processing three dimensional computer vision and graphics for film, TV, music and game content creation.
- Previous systems e.g., Facebook FrankMocap, predict only the naked body shape and pose from a single image. Such systems cannot predict clothes surface. Such systems are a 2D image translation approach and cannot handle multiview input.
- Implicit Part Network predicts both body and clothes from a scanned or reconstructed point cloud but requires 3D scans and cannot handle RGB images as input, nor facial expression and appearance. Also, Implicit Part Network only predicts a label to identify a voxel as body or clothes, and then fits the human prior model explicitly, which is slow.
- Neural Body and Animatable NeRF predict clothes human body without facial expression using neural radiance field (NeRF). But they require to create a dense latent code volume, which is limited to a low resolution and results in a coarse human geometry. And they can only recover volumetric human model without mesh vertex correspondences.
- Multiview neural human prediction includes predicting a 3D human model including skeleton, body shape and clothes displacement and appearance from a set of multiview images given camera calibration.
- a neural network takes an input set of images, which is able to be a single image or multiple images, from different views, and predicts a layered 3D human model.
- the set of images comprises a 4D tensor of size N x w x h x c, where N is a number of views, w is width of an image, h is height of the image, and c is a channel of the image.
- Camera information for the set of images is known.
- the output model contains three layers from inner to outer: a skeleton at a predicted pose; a naked 3D body of a predicted shape with facial expression (e.g., SMPL-X model parameterized by blendshapes and joint rotations); and a 3D field of clothes displacement and the appearance RGB color inferred from the input images.
- a clothed body mesh is obtained by deforming the naked 3D body mesh according to the clothes displacement field.
- the neural network is comprised of three sub-networks: a multiview stereo 3D convolutional neural network (MVS-3DCNN), which encodes the input image set to features, a human mesh recovery multilayer perceptron (HMR MLP), which regresses the features to human parameters, and a neural radiance field multilayer perceptron (NeRF MLP), which fine-tunes the MVS-3DCNN and decodes a query 3D ray (3D location and direction) to an RGB color and a clothes-to-body displacement.
- MVS-3DCNN multiview stereo 3D convolutional neural network
- HMR MLP human mesh recovery multilayer perceptron
- NeRF MLP neural radiance field multilayer perceptron
- MVS-3DCNN takes the multiview image set as input, chooses the frontal view as the reference view and extracts a feature volume.
- HMR MLP regresses all the feature volumes to the human pose, shape, facial expression parameters.
- SMPL-X model generates the human naked body mesh according to the parameters. And then the naked body mesh is converted into an occupancy field in its bounding box.
- the trained NeRF MLP For any 3D point near the body mesh, associated with ray directions from each center of view, the trained NeRF MLP generates an RGB color and a 3D displacement vector pointing to the surface of the naked body.
- the appearance of the clothed human body is able to be rendered as an RGB image.
- the clothed body mesh e.g., SMPL-X+D, of the same vertex correspondence to the SMPL-X model.
- training the neural network includes two cases: supervision and self-supervision.
- a supervision case a labeled dataset with known human parameters is given, e.g., H36M dataset.
- the ground truth (GT) parameters and shapes are compared with the CNN-regressed parameters and shapes. The difference is computed as a shape loss.
- rays are cast from sampled pixel in the input image set, and NeRF MLP renders the rays and regresses parameters to colors and densities, which is a function of the density of the naked body and the 3D clothes displacement.
- a color loss is computed by the sum of differences of sampled pixel colors and rendered colors.
- Figure 1 illustrates a flowchart of neural human prediction according to some embodiments.
- Figure 2 illustrates the workflow of a forward prediction represented by the tensor notations, in which the weight of all the networks MVS 3DCNN, HMR MLP and NeRF MLP are known, according to some embodiments.
- Figure 3 illustrates the workflow of training the network using supervision according to some embodiments.
- Figure 4 illustrates the workflow of training the network in a self-improving strategy according to some embodiments.
- Figure 5 illustrates the alignment of the MVS 3DCNN of each view to the NeRF MLP according to some embodiments.
- Neural human prediction includes predicting a 3D human model including a pose of a skeleton, body shape and clothes displacement and appearance from a set of images (a single image or multiview images).
- Embodiments of the neural human prediction describe methods for using a neural network.
- Multiview neural human prediction outperforms the single image -based mocap and human lifting in quality and robustness, simplifies the architecture of the body clothes prediction network such as Implicit Part Network, which takes a sparse point cloud as input with heavy memory cost and performs slowly, and avoids the resolution limitation of latent-code -based network, such as Neural Body, which encodes the entire 3D volume.
- Figure 1 illustrates a flowchart of neural human prediction according to some embodiments.
- an input set I of images a single image or multiview images, e.g., a set of pictures taken around a subject, are acquired as input.
- the input I is denoted as a 4D tensor of size N x w x h x c, N for number of views, w, h, c for image width, height and channel, respectively.
- the cameras are already calibrated, so all of the camera information (e.g., camera parameters) is known.
- An image preprocess extracts the subject’s bounding box and foreground mask using existing approaches such as Detectron2 and image Grab-Cut. Images are cropped by the bounding box and zoomed to size of w x h with the same aspect ratio. Image borders are filled in black.
- the neural network (MVS-PERF) 102 is comprised of three components: a multiview stereo 3D convolutional neural network (MVS-3DCNN) 104, which encodes an input set of images to features; a human mesh recovery multilayer perceptron (HMR MLP) 106, which regresses the features to human parameters; and a neural radiance field multilayer perceptron (NeRF MLP) 108, which fine-tunes the MVS-3DCNN and decodes a query 3D ray (3D location and direction) to an RGB color and a clothes-to-body displacement.
- a multiview stereo 3D convolutional neural network (MVS-3DCNN) 104, which encodes an input set of images to features
- HMR MLP human mesh recovery multilayer perceptron
- NeRF MLP neural radiance field multilayer perceptron
- a deep 2D CNN extracts image features from each view.
- Each convolutional layer is followed by a batch-normalization (BN) layer and a rectified linear unit (ReLU) except for the last layer.
- BN batch-normalization
- ReLU rectified linear unit
- Two downsampling layers are also placed.
- the output of the 2D CNN are a feature map of size w/4 x h/4 x 32.
- a view is first chosen as a reference view and its view frustum is set according to perspective projection and near far planes to cover the entire working space of the subject. From near to far, the frustrum is sampled by d depth planes which are parallel to both near and far planes. All the feature maps are transformed and blended to each depth plane.
- K [R, /] stand for the camera intrinsic and extrinsic parameters
- z is the distance from a depth plane to the camera center of the reference view
- n is the normal direction of the depth plane.
- the human mesh recovery multilayer perceptron includes three layers of linear regression separated by flatten and dropout layers. It regresses the feature volume from MVS 3DCNN to the human body parameter
- Human body parameter is able to manipulate a human parametric model, e.g., SMPL- X, to a 3D naked body mesh 202.
- a SMPL-X representation of contains the skeletal poses (the 3D rotation angles of each joint), the body blendshape parameter to control the body shape, e.g., height, weight, and others, and the facial blendshape parameter to control the expression of the face. It builds a T-pose mesh using blendshape parameters and deforms it to a posed mesh by the skeletal pose of a linear skinning model.
- the cost volume is sent to a differentiable rendering MLP, such as neural radiance field (NeRF).
- NeRF MLP is formularized as a functional M that maps a query ray, represented by a 3D position x and a direction f, to a 4-channel color is the feature map from the cost volume of the frustum MVS
- 3DCNN 104 to the NeRF volume
- G is the weight of the NeRF MLP network
- s denotes the occupancy density of a probability if the 3D point is inside a mesh.
- the occupancy density field of a naked body can be directly obtained by converting the mesh 202 (Fig. 2) in the frustum 104. Then the density field s of clothed body can be represented as a function of a 3D displacement vector field D and the feature map
- the 3D displacement vector field D 116 represents how a point on the clothed body surface 204 is related to a point on the naked body surface. When NeRF MLP is trained, the displacement vector field D is also optimized.
- Figure 2 illustrates the workflow of a forward prediction represented by the tensor notations, in which the weight of all the networks MVS 3DCNN, HMR MLP and NeRF MLP are trained and fixed, according to some embodiments.
- the appearance image 112 is rendered.
- 3D human prediction 110 is implemented.
- the displacement field D 116 is obtained.
- the naked body mesh V b 202 can be deformed to a clothed body mesh V c 204 by adding an interpolated displacement vector to each vertex.
- Figure 3 illustrates the workflow of training the network using supervision according to some embodiments.
- a supervised training dataset e.g., Human3.6M
- a shape loss 304 is directly obtained by summing the difference of the predicted naked body and the ground truth.
- J are the joints of the naked body
- P denotes the perspective projection of a 3D point for each camera view.
- rays 306 are sampled from the input image set 100, typically using an uneven sampling strategy proportional to the image saliency. More rays are sampled in high salient regions and fewer rays are from plain or background regions. These rays are sent together with the feature map from MVS 3DCNN 104 into the NeRF MLP 106, which renders the samples appearance RGB colors 308. A color loss 310 is computed by summing all the difference of sampled color in the input image and the rendered colors 308.
- a parallelized stochastic optimization algorithm e.g., Adam, is applied to train the weight of all networks MVS 3DCNN, HMR MLP, NeRF MLP by minimizing both shape and color losses.
- Figure 4 illustrates the workflow of training the network in a self-improving strategy according to some embodiments.
- the training dataset only provides human images without any annotation or human ground truth parameters.
- an optimization-based prediction 400 e.g., SMPLifyX algorithm
- the optimization-based prediction detects human 2D key points on each image first and applies a nonlinear optimization to fit the 3D human.
- K denotes the detected 2D location of a key point, and the sum takes over all the corresponding key points and all the views.
- Figure 5 illustrates the alignment of the MVS 3DCNN of each view to the NeRF MLP according to some embodiments.
- the neural human prediction is able to be directly applied in both commercial and/or personal markerless performance capture applications, for example, a markerless motion capture in game studio, or human 3D surface reconstruction RGB camera setup.
- Other applications of embodiments of the multiview neural human prediction are able to be as a real-time backbone technique able to be combined with any extension, for example, combining the input of depth sensing, 3D modeling, or using the output for creating novel animation.
- Multiview neural human prediction is also able to be applied in gaming, VR/AR and any real-time human interactive applications.
- the multiview neural human prediction is in real-time when processing sparser views for prediction, and for more views (e.g., 20), near real-time processing and prediction is able to be implemented.
- suitable computing devices include a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a handheld computer, a personal digital assistant, a cellular/mobile telephone, a smart appliance, a gaming console, a digital camera, a digital camcorder, a camera phone, a smart phone, a portable music player, a tablet computer, a mobile device, a video player, a video disc writer/player (e.g., DVD writer/player, high definition disc writer/player, ultra high definition disc writer/player), a television, a home entertainment system, an augmented reality device, a virtual reality device, smart jewelry (e.g., smart watch), a vehicle (e.g., a self-driving vehicle) or any other suitable computing device.
- a method programmed in a non-transitory of a device comprising: acquiring a set of images as input; and processing, with a neural network, the set of images, wherein processing includes: encoding the set of images to one or more features; regressing the features to human parameters; fine-tuning the neural network; and decoding a query 3D ray to an RGB color and a clothes-to-body displacement, wherein the RGB color is based on the set of images.
- the set of images comprises a 4D tensor of size N x w x h x c, where N is a number of views, w is width of an image, h is height of the image, and c is a channel of the image.
- An apparatus comprising: a non-transitory memory configured for storing an application, the application configured for: acquiring a set of images as input; and processing, with a neural network, the set of images, wherein processing includes: encoding the set of images to one or more features; regressing the features to human parameters; fine-tuning the neural network; and decoding a query 3D ray to an RGB color and a clothes-to-body displacement, wherein the RGB color is based on the set of images; and a processor configured for processing the application.
- the set of images comprises a 4D tensor of size N x w x h x c, where N is a number of views, w is width of an image, h is height of the image, and c is a channel of the image.
- the neural network chooses a frontal view as a reference view from the set of images and extracts a feature volume.
- An apparatus comprising: a non-transitory memory configured for storing an application, the application comprising: a multiview stereo 3D convolutional neural network (MVS-3DCNN) configured for encoding an input image set to features; a human mesh recovery multilayer perceptron (HMR MLP) configured for regressing the features to human parameters; and a neural radiance field multilayer perceptron (NeRF MLP) configured for fine-tuning the MVS-3DCNN and decodes a query 3D ray (3D location and direction) to an RGB color and a clothes-to-body displacement; and a processor configured for processing the application.
- MVS-3DCNN multiview stereo 3D convolutional neural network
- HMR MLP human mesh recovery multilayer perceptron
- NeRF MLP neural radiance field multilayer perceptron
- the set of images comprises a 4D tensor of size N x w x h x c, where N is a number of views, w is width of an image, h is height of the image, and c is a channel of the image.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Graphics (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Geometry (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237033483A KR20230150867A (ko) | 2021-03-31 | 2022-03-31 | 얼굴 표정, 신체 자세 형상 및 의류 퍼포먼스 캡처를 위해 암시적 구별가능 렌더러를 사용하는 멀티뷰 신경 사람 예측 |
JP2023556536A JP2024510230A (ja) | 2021-03-31 | 2022-03-31 | 顔表情、身体ポーズ形状及び衣服パフォーマンスキャプチャのための暗黙的微分可能レンダラーを用いたマルチビューニューラル人間予測 |
EP22715732.8A EP4292059A1 (en) | 2021-03-31 | 2022-03-31 | Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture |
CN202280006134.7A CN116134491A (zh) | 2021-03-31 | 2022-03-31 | 用于面部表情、身体姿态形态和衣服表演捕捉的使用隐式可微分渲染器的多视图神经人体预测 |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163168467P | 2021-03-31 | 2021-03-31 | |
US63/168,467 | 2021-03-31 | ||
US202163279916P | 2021-11-16 | 2021-11-16 | |
US63/279,916 | 2021-11-16 | ||
US17/701,991 US11961266B2 (en) | 2021-03-31 | 2022-03-23 | Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture |
US17/701,991 | 2022-03-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022208440A1 true WO2022208440A1 (en) | 2022-10-06 |
Family
ID=81328451
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2022/053034 WO2022208440A1 (en) | 2021-03-31 | 2022-03-31 | Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP4292059A1 (ja) |
JP (1) | JP2024510230A (ja) |
KR (1) | KR20230150867A (ja) |
CN (1) | CN116134491A (ja) |
WO (1) | WO2022208440A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116824092A (zh) * | 2023-08-28 | 2023-09-29 | 深圳星坊科技有限公司 | 三维模型生成方法、装置、计算机设备和存储介质 |
CN117238420A (zh) * | 2023-11-14 | 2023-12-15 | 太原理工大学 | 一种极薄带力学性能预测方法及装置 |
WO2024187847A1 (zh) * | 2023-03-14 | 2024-09-19 | 中国科学院深圳先进技术研究院 | 人手图像合成方法、装置、电子设备及存储介质 |
-
2022
- 2022-03-31 JP JP2023556536A patent/JP2024510230A/ja active Pending
- 2022-03-31 WO PCT/IB2022/053034 patent/WO2022208440A1/en active Application Filing
- 2022-03-31 KR KR1020237033483A patent/KR20230150867A/ko unknown
- 2022-03-31 EP EP22715732.8A patent/EP4292059A1/en active Pending
- 2022-03-31 CN CN202280006134.7A patent/CN116134491A/zh active Pending
Non-Patent Citations (4)
Title |
---|
AMIT RAJ ET AL: "PVA: Pixel-aligned Volumetric Avatars", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 January 2021 (2021-01-07), XP081854400 * |
LI ZHONGGUO ET AL: "Learning to Implicitly Represent 3D Human Body From Multi-scale Features and Multi-view Images", 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE, 10 January 2021 (2021-01-10), pages 8968 - 8975, XP033909971, DOI: 10.1109/ICPR48806.2021.9412556 * |
SHIH-YANG SU ET AL: "A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 February 2021 (2021-02-11), XP081881269 * |
SIDA PENG ET AL: "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 March 2021 (2021-03-29), XP081901285 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024187847A1 (zh) * | 2023-03-14 | 2024-09-19 | 中国科学院深圳先进技术研究院 | 人手图像合成方法、装置、电子设备及存储介质 |
CN116824092A (zh) * | 2023-08-28 | 2023-09-29 | 深圳星坊科技有限公司 | 三维模型生成方法、装置、计算机设备和存储介质 |
CN116824092B (zh) * | 2023-08-28 | 2023-12-19 | 深圳星坊科技有限公司 | 三维模型生成方法、装置、计算机设备和存储介质 |
CN117238420A (zh) * | 2023-11-14 | 2023-12-15 | 太原理工大学 | 一种极薄带力学性能预测方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
EP4292059A1 (en) | 2023-12-20 |
KR20230150867A (ko) | 2023-10-31 |
JP2024510230A (ja) | 2024-03-06 |
CN116134491A (zh) | 2023-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11961266B2 (en) | Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture | |
Li et al. | Monocular real-time volumetric performance capture | |
US11941831B2 (en) | Depth estimation | |
US20240005590A1 (en) | Deformable neural radiance fields | |
KR20210042942A (ko) | 비디오 데이터를 이용한 객체 인스턴스 매핑 | |
WO2022208440A1 (en) | Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture | |
KR102141319B1 (ko) | 다시점 360도 영상의 초해상화 방법 및 영상처리장치 | |
CN113689578B (zh) | 一种人体数据集生成方法及装置 | |
US20230130281A1 (en) | Figure-Ground Neural Radiance Fields For Three-Dimensional Object Category Modelling | |
US20210374986A1 (en) | Image processing to determine object thickness | |
CN116958492B (zh) | 一种基于NeRf重建三维底座场景渲染的VR编辑方法 | |
GB2567245A (en) | Methods and apparatuses for depth rectification processing | |
CN117542122B (zh) | 人体位姿估计与三维重建方法、网络训练方法及装置 | |
CN113850900A (zh) | 三维重建中基于图像和几何线索恢复深度图的方法及系统 | |
CN118505878A (zh) | 一种单视角重复对象场景的三维重建方法与系统 | |
GB2571307A (en) | 3D skeleton reconstruction from images using volumic probability data | |
CN116310228A (zh) | 一种针对遥感场景的表面重建与新视图合成方法 | |
CN111783497A (zh) | 视频中目标的特征确定方法、装置和计算机可读存储介质 | |
CN116797713A (zh) | 一种三维重建方法和终端设备 | |
CN113570673B (zh) | 三维人体和物体的渲染方法及其应用方法 | |
CN116433852B (zh) | 数据处理方法、装置、设备及存储介质 | |
WO2023132261A1 (ja) | 情報処理システム、情報処理方法および情報処理プログラム | |
CN116681818B (zh) | 新视角重建方法、新视角重建网络的训练方法及装置 | |
US20230177722A1 (en) | Apparatus and method with object posture estimating | |
Johnston | Single View 3D Reconstruction using Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22715732 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022715732 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023556536 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 2022715732 Country of ref document: EP Effective date: 20230912 |
|
ENP | Entry into the national phase |
Ref document number: 20237033483 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |