CN116529766A - Automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using integrated photo-video volume capture system and mesh tracking - Google Patents

Automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using integrated photo-video volume capture system and mesh tracking Download PDF

Info

Publication number
CN116529766A
CN116529766A CN202280007210.6A CN202280007210A CN116529766A CN 116529766 A CN116529766 A CN 116529766A CN 202280007210 A CN202280007210 A CN 202280007210A CN 116529766 A CN116529766 A CN 116529766A
Authority
CN
China
Prior art keywords
scanning
mesh
tracking
capture system
extreme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280007210.6A
Other languages
Chinese (zh)
Inventor
S·D·拉贾瑟卡兰
武田浩行
田代健治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Sony Optical Archive Inc
Original Assignee
Sony Group Corp
Optical Archive Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/706,996 external-priority patent/US20220319114A1/en
Application filed by Sony Group Corp, Optical Archive Inc filed Critical Sony Group Corp
Priority claimed from PCT/IB2022/053036 external-priority patent/WO2022208442A1/en
Publication of CN116529766A publication Critical patent/CN116529766A/en
Pending legal-status Critical Current

Links

Abstract

An integrated photo-video volume capture system for 3D/4D scanning acquires 3D and 4D scans by acquiring images and video simultaneously. A volume capture system for high quality 4D scanning and mesh tracking is used to establish topological correspondence across the mesh sequence of the 4D scanning to generate corrected shapes to be used for shape interpolation and skeleton driven deformation. The volume capture system facilitates mesh tracking to maintain mesh registration (topological consistency) and facilitates extreme pose modeling. Being able to identify the primary upper and lower body joints, these joints are important for generating deformations and capturing the same deformations using a wide range of motion of all movement types across all joint classes. Topology changes are tracked through the use of a volume capture system and mesh tracking. Each gesture captured will have the same topology, which makes mixing between multiple gestures easier and more accurate.

Description

Automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using integrated photo-video volume capture system and mesh tracking
Cross Reference to Related Applications
According to 35U.S. c. ≡119 (e), the present application claims priority from U.S. provisional patent application serial No. 63/169,323 entitled, "AUTOMATIC BLENDING OF HUMAN FACIAL EXPRESSION AND FULL-BODY post FOR DYNAMIC DIGITAL HUMAN MODEL CREATION USING INTEGRATED PHOTO-VIDEO VOLUMETRIC CAPTURE SYSTEM AND MESH-track," filed on 1, month 4 of 2021, which is incorporated herein by reference in its entirety FOR all purposes.
Technical Field
The present invention relates to three-dimensional computer vision and graphics for the entertainment industry. More particularly, the present invention relates to acquiring and processing three-dimensional computer vision and graphics for movie, TV, music and game content creation.
Background
Virtual human creation is highly manual, time consuming and expensive. A recent trend is to efficiently create realistic digital mannequins by multi-view camera 3D/4D scanners, rather than manually making Computer Graphics (CG) artwork from scratch. There are various 3D scanner workshops (3 Lateral, avatta, TEN, pixel Light Effect, eisko) and 4D scanner workshops (4 DViews, microsoft, 8i, DGene) around the world for camera-based capture of human body digitization.
A photo-based 3D scanner studio includes multiple arrays of high resolution photographic cameras. The prior art of 3D scanning is typically used to create assembly modeling and requires manual animation because it does not capture the deformation. A video-based 4D scanner (4d=3d+time) studio includes multiple arrays of high frame rate machine vision cameras. It captures natural surface dynamics but it cannot create novel facial expressions or body movements due to fixed video and movements. The dummy actor needs to perform many sequences of actions, which means a huge workload of the actor.
Disclosure of Invention
An integrated photo-video volume capture (volumetric capture) system for 3D/4D scanning acquires 3D and 4D scans by acquiring images and video simultaneously. A volume capture system for high quality 4D scanning and mesh tracking is used to establish topological correspondence across the mesh sequence of the 4D scanning to generate corrected shapes to be used for shape interpolation and skeleton driven deformation. The volume capture system facilitates mesh tracking to maintain mesh registration (topological consistency) and facilitates extreme pose (extremum pose) modeling. Being able to identify the primary upper and lower body joints, these joints are important for generating deformations and capturing the same deformations using a wide range of motion of all movement types across all joint classes. Topology changes are tracked through the use of a volume capture system and mesh tracking. Each gesture captured will have the same topology, which makes mixing between multiple gestures easier and more accurate.
In one aspect, a method of programming in a non-transitory memory of a device includes using a volume capture system configured for 3D scanning and 4D scanning that includes capturing photos and videos simultaneously, wherein the 3D scanning and 4D scanning include detecting muscle deformations of an actor, and implementing mesh generation based on the 3D scanning and 4D scanning. The 3D scan and the 4D scan include: a 3D scan to be used for generating an automatic high fidelity extreme pose and a 4D scan comprising high time resolution enabling mesh tracking to automatically register the extreme pose mesh for blending. Generating the auto-high fidelity extreme pose includes generating the auto-high fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor. A topological correspondence is established across the grid sequence of the 4D scan using the 4D scan and grid tracking to generate corrected shapes for shape interpolation and skeleton-driven deformation. The method further includes identifying and targeting joints and muscles of the actor by a volume capture system for the 3D scan and the 4D scan. Grid generation includes muscle estimation or projection based on 3D and 4D scans and machine learning. Implementing mesh generation includes using 3D and 4D scans to generate a mesh under extreme gestures including muscle deformation. The method further includes implementing mesh tracking for tracking topology changes to enable each captured gesture to have the same topology for mixing between gestures.
In another aspect, an apparatus includes a non-transitory memory to store an application to: using a volume capture system configured for 3D scanning and 4D scanning comprising capturing photos and videos simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformations of actors, and implementing mesh generation based on the 3D scanning and 4D scanning, and a processor coupled to the memory, the processor configured to process the application. The 3D scan and the 4D scan include: a 3D scan to be used for generating an automatic high fidelity extreme pose and a 4D scan comprising high time resolution enabling mesh tracking to automatically register the extreme pose mesh for blending. Generating the auto-high fidelity extreme pose includes generating the auto-high fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor. A topological correspondence is established across the grid sequence of the 4D scan using the 4D scan and grid tracking to generate corrected shapes for shape interpolation and skeleton-driven deformation. The application is also configured to identify and target joints and muscles of the actor by the volume capture system for the 3D scan and the 4D scan. Grid generation includes muscle estimation or projection based on 3D and 4D scans and machine learning. Implementing mesh generation includes using 3D and 4D scans to generate a mesh under extreme gestures including muscle deformation. The application is further configured to implement grid tracking for tracking topology changes to enable each captured gesture to have the same topology for mixing between gestures.
In another aspect, a system includes a volume capture system for 3D scanning and 4D scanning that includes capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning include detecting muscle deformations of an actor, and a computing device configured to: the captured photographs and videos are received from a volume capture system, and mesh generation is implemented based on the 3D scan and the 4D scan. The 3D scan and the 4D scan include: a 3D scan to be used for generating an automatic high fidelity extreme pose and a 4D scan comprising high time resolution enabling mesh tracking to automatically register the extreme pose mesh for blending. Generating the auto-high fidelity extreme pose includes generating the auto-high fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor. A topological correspondence is established across the grid sequence of the 4D scan using the 4D scan and grid tracking to generate corrected shapes for shape interpolation and skeleton-driven deformation. The volume capture system is also configured to identify and target joints and muscles of the actor by the volume capture system for 3D scanning and 4D scanning. Grid generation includes muscle estimation or projection based on 3D and 4D scans and machine learning. Implementing mesh generation includes using 3D and 4D scans to generate a mesh under extreme gestures including muscle deformation. The volume capture system is also configured to implement mesh tracking for tracking topological changes to enable each captured pose to have the same topology for mixing between poses.
Drawings
FIG. 1 illustrates a flow chart of a method of animating an object (subject) using a photo-video volume capture system according to some embodiments.
FIG. 2 illustrates a diagram of a grid generated by combining neutral and extreme gestures, according to some embodiments.
FIG. 3 illustrates a graph of correlation between human anatomy and computer graphics, according to some embodiments.
Fig. 4A-4B illustrate diagrams of muscle movement, according to some embodiments.
Fig. 5 illustrates an example of a primary muscle group, according to some embodiments.
FIG. 6 illustrates a graph of joint-based movement types for grid capture, in accordance with some embodiments.
FIG. 7 illustrates a graph of joint-based movement types for grid capture, in accordance with some embodiments.
FIG. 8 illustrates an example of an extreme gesture according to some embodiments.
Fig. 9 illustrates a diagram of automatic blending shape (blendmap) extraction, according to some embodiments.
FIG. 10 illustrates a flow chart for implementing grid generation in accordance with some embodiments.
FIG. 11 illustrates a block diagram of an exemplary computing device configured to implement an auto-mixing method, in accordance with some embodiments.
Detailed Description
An automated blending system utilizes an integrated photo-video volume capture system for 3D/4D scanning to acquire 3D and 4D scans by acquiring images and video simultaneously. The 3D scan can be used to generate an auto-high fidelity extreme pose, while the 4D scan includes a high temporal resolution that enables grid tracking to automatically register the extreme pose grid for blending.
A volume capture system (photo-video based) for high quality 4D scanning and mesh tracking can be used to establish topological correspondence across the mesh sequence of the 4D scanning to generate corrected shapes that will be used for shape interpolation and skeleton driven deformation. Unlike hand-made shape modeling that facilitates registration but with manual shape generation and 3D scan-based approaches that facilitate shape generation rather than registration, photo-video systems facilitate grid tracking to maintain grid registration (topological consistency) and facilitate extreme pose modeling.
The methods described herein are based on photo-video capture from a "photo-video volume capture system". PHOTO-video based capture is described in PCT patent application PCT/US2019/068151 entitled "PHOTO-VIDEO BASED SPATIAL-TEMPORAL VOLUMETRIC CAPTURE SYSTEM FOR DYNAMIC 4D HUMAN FACE AND BODY DIGITIZATION," filed 12/20, 2019, which is incorporated herein by reference in its entirety for all purposes. As described, a photo-video capture system can capture high fidelity textures at sparse times, and between photo captures, video is captured, and video can be used to establish correspondence (e.g., transitions) between sparse photos. The correspondence information can be used to implement grid tracking.
Being able to identify the primary upper and lower body joints, these joints are important for generating deformations and capturing the same deformations using a wide range of motion of all movement types across all joint classes. The joint can be used for muscle deformation. For example, by knowing how the joints move and how muscles in the vicinity of the joints deform, skeletal/joint information can be used for muscle deformation, which can be used for grid generation. Further deepening the example, the acquired images and videos can also be used by videos with muscle deformities, and the grid of muscle deformities can be generated more accurately.
Topology changes can be tracked through the use of photo-video systems and grid tracking. Thus, each gesture captured will have the same topology, which makes mixing between multiple gestures easier and more accurate.
FIG. 1 illustrates a flow chart of a method of animating an object using a photo-video volume capture system according to some embodiments. In step 100, grid creation/generation is implemented using an integrated voxel-video system. Grid generation includes extreme pose modeling and registration for blending. As described, an integrated photo-video volume capture system for 3D/4D scanning acquires 3D and 4D scans by simultaneously acquiring images and video of an object/actor. The 3D scan can be used to generate an auto-high fidelity extreme pose, while the 4D scan includes a high temporal resolution that enables grid tracking to automatically register the extreme pose grid for blending. In step 102, a skeleton fit is achieved. The skeleton fit can be achieved in any manner, such as based on relative marker trajectories. In step 104, skin weight mapping is performed. Skin weighting can be achieved in any manner, such as determining the weight of each piece of skin and drawing accordingly. In step 104, animation is performed. The animation can be performed in any manner. Each step can be performed manually, semi-automatically, or automatically, depending on the implementation. In some embodiments, fewer or additional steps are implemented. In some embodiments, the order of the steps is modified.
FIG. 2 illustrates a diagram of a grid generated by combining neutral and extreme gestures, according to some embodiments. The neutral posture can be any standard posture such as standing with arms down, arms up, or arms sideways overhanging. An extreme gesture is a gesture between standard gestures, such as when an object moves between standard gestures. Capturing extreme gestures by targeting specific parts of the human muscle enables the generation of extreme shapes for a game development pipeline (pipeline). Photo-video systems and mesh tracking can be used to target all muscle groups of the human body to capture and solve the problem of maintaining mesh registration in the graphics game development pipeline.
When developing a new video game, a model is captured for the game. An actor typically enters a studio once to be recorded to perform a particular movement and/or action. The studio uses a photo-video volume capture system to comprehensively capture all muscle deformations of the actor. Furthermore, by using existing types of ergonomic movements and deformations occurring in the human body, the corresponding mesh can have similar deformations. Using the previously captured neutral pose and the additional captured poses, the system is able to deform the model to resemble human movements/deformations. In addition, the ergonomic movements, deformations, and/or other knowledge and data can be used to train the system.
FIG. 3 illustrates a graph of correlation between human anatomy and computer graphics, according to some embodiments. In human anatomy, musculoskeletal actuation involves receiving signals from the motor cortex of a person. Then, muscle deformation occurs, which enables joint/bone movement by muscle pulling of the bone. In addition, there is skin/fat movement. In computer graphics meshes, motion drivers trigger movement of animated characters, particularly by performing joint/bone movements. Then mesh deformation (skeleton subspace deformation (SSD)) occurs, followed by mesh deformation (pose space deformation (PSD)). A clear correlation can be seen between the human anatomy and the mesh generated using computer graphics.
Fig. 4A-4B illustrate diagrams of muscle movement, according to some embodiments. As shown, the body part is bent at joints such as the head at the neck, the hand at the wrist, the fingers at the knuckles, the legs at the knees, and the feet at the ankles. In some embodiments, all joint movements can fall into 12 categories. In some embodiments, by classifying joint movements into categories, the correct muscle deformation can be generated based on the classified movements. For example, when a person bends at the knees, certain muscles in the legs are deformed, and using machine learning, the correct muscles can be deformed at the appropriate time. Muscle movement is the type of movement that an actor will perform, including the range of motion. Muscle movement is targeted for capture. [ FIGS. 4A-4B,DeSaix,Peter,et al. "Anatomy & Physiology (OpenStax)." (2013.) (retrieved from https:// openlibrary-rep. Ecampsontario ca/jspri/handle/123456789/331) ]
Fig. 5 illustrates an example of a primary muscle group, according to some embodiments. The upper and lower body each have 4 joints (excluding the finger/toe joints). The joints in the upper body include: shoulder, elbow, neck and hand, while the joints in the lower body include: torso, hips, knees and ankles. Each joint has a corresponding muscle group. As mentioned, these corresponding muscle groups deform when the character is in motion. As the actor is moving, the lower and upper body muscles are the primary targets for capture.
FIG. 6 illustrates a graph of joint-based movement types for grid capture, in accordance with some embodiments. There are many different types of movement with different ranges of angles of motion (0 to 180 degrees) for each of the primary upper and lower joints. By including various movement types, the desired muscle can be captured and then later utilized in generating the mesh.
FIG. 7 illustrates a graph of joint-based movement types for grid capture, in accordance with some embodiments. Two of the 12 movement types (flexion/extension and pronation/supination) are shown. In some embodiments, the angular range of motion may be selected from 0, 90, and 180 degrees, while in some embodiments finer adjustment of the angular range of motion may reach a particular degree or even fraction of a degree.
FIG. 8 illustrates an example of an extreme gesture according to some embodiments. Image 800 shows six types of movement, such as lifting the arm up to the side, lifting the arm over the head from under the hip, and extending the arm to the front. Image 802 shows four joints and a target muscle.
Fig. 9 illustrates a diagram of automatic hybrid shape extraction, according to some embodiments. Gesture parameters 900 in combination with facial action unit 902 result in a grid 904 of 4D tracking. The automatic hybrid shape extraction method uses a 4D scan of the moving face, which speeds up the character making process and reduces production costs. A 4D face scanning method can be used, such as U.S. patent application No.17/411,432 entitled "PRESERVING GEOMETRY DETAILS IN A SEQUENCE OF TRACKED MESHES," filed 8.25 OF 2021, which is incorporated herein by reference in its entirety for all purposes. As shown in 904, it provides a grid of high quality 4D tracking of moving faces, and gesture parameters 900 can also be obtained from the tracked 4D grid. The user may use control points or bones for gesture representation. [ FIG. 9, center view from P.Ekman, wallace V.Friesen, joseph C.Hager, "Facial action coding system: A technique for the measurement of facial movement > > Psychology 1978,2002.ISBN 0-931835-01-1 1 ]
Facial action units are of interest. In the case where a grid is available that includes 4D tracking of various different expressions, a set of person-specific facial action units can be automatically generated. It can be seen as a decomposition of the 4D mesh into dynamic gesture parameters and static action units, where only the action units are unknown. Machine learning techniques for decomposing problems can be used.
FIG. 10 illustrates a flow chart for implementing grid generation in accordance with some embodiments. In step 1000, a volume capture system is used for high quality 3D/4D scanning. As described in PCT patent application PCT/US2019/068151, the volume capture system is capable of simultaneously taking photos and video for high quality 3D/4D scanning. The high quality 3D/4D scan includes denser camera views for high quality modeling. In some embodiments, instead of using a volume capture system, another system for acquiring 3D content and time information is used. For example, at least two separate 3D scans are acquired. Further examples, individual 3D scans can be captured and/or downloaded.
During the capture time, joint and muscle movement and deformation are acquired. For example, a specific muscle and a specific deformation of the muscle over time are captured. During the capture time, a particular joint and muscle corresponding to the actor's joint can be targeted. For example, the target object/actor movement can be requested and the muscles will deform. The deformation of the muscles can be captured statically and in motion. The information obtained from the movements and deformations can be used to train the system so that the system can use the joint and muscle information to perform any movements of the character. For very complex situations, this is very difficult for an animator. Any complex muscle deformities are learned during the modeling phase. This enables composition in the animation phase.
In step 1002, mesh generation is implemented. Once high quality information is captured for the scan, mesh generation is achieved, including extreme pose modeling and registration for blending. The 3D scan information can be used to generate an auto-high fidelity extreme pose. For example, frames between key frames can be appropriately generated using 4D scan information including frame information between key frames. The high temporal resolution of the 4D scan information enables grid tracking to automatically register the extreme pose grid for blending. In another example, the 4D scan enables mesh generation of muscles that deform over time. Similarly, using machine learning involving joint information and corresponding muscle and muscle deformation information, a grid including muscle deformation information can be generated without movement acquired by the capture system. For example, while an actor is requested to perform standing vertical jumps and running for capture, the capture system does not acquire the actor to perform running jumps. However, based on the acquired information of standing vertical jumps and running, wherein the acquired information includes muscle deformations during these actions, and using machine learning with knowledge of joints and other physiological information, a grid for running jumps including detailed muscle deformations can be generated. In some embodiments, grid generation includes muscle estimation or projection based on 3D and 4D scans and machine learning.
The primary upper and lower body joints can be identified, which are important for generating deformations and capturing deformations using a wide range of motion of all movement types across all joint classes.
By using a volume capture system and mesh tracking, topology changes can be tracked. Thus, each gesture captured will have the same topology, which makes mixing between multiple gestures easier and more accurate. When generating the mesh, the joints and muscles targeted can be utilized.
In some embodiments, grid generation includes generating a static grid based on 3D scan information, and the grid can be modified/animated using 4D scan information. For example, additional grid information can be created/generated from the video content of the 4D scan information and/or the machine learning information as the grid moves in time. As described, the transition between each frame of the animated mesh can maintain a topology such that the mesh tracks and blends are smooth. In other words, topological correspondence is established across the mesh sequence of the 4D scan to generate a corrected shape to be used for shape interpolation and skeleton-driven deformation.
In some embodiments, fewer or additional steps are implemented. In some embodiments, the order of the steps is modified.
FIG. 11 illustrates a block diagram of an exemplary computing device configured to implement an auto-mixing method, in accordance with some embodiments. Computing device 1100 can be used to obtain, store, calculate, process, transfer, and/or display information such as images and video. Computing device 1100 can implement any of the auto-mix aspects. In general, hardware structures suitable for implementing computing device 1100 include a network interface 1102, memory 1104, processor 1106, I/O device(s) 1108, bus 1110, and storage device 1112. The choice of processor is not critical as long as a suitable processor with sufficient speed is selected. The memory 1104 can be any conventional computer memory known in the art. The storage 1112 can include a hard disk drive, CDROM, CDRW, DVD, DVDRW, high-definition disk/drive, ultra-high-definition drive, flash memory card, or any other storage device. The computing device 1100 can include one or more network interfaces 1102. Examples of network interfaces include a network card connected to an ethernet or other type of LAN. I/O device(s) 1108 can include one or more of the following: keyboard, mouse, monitor, screen, printer, modem, touch screen, button interface, and other devices. The auto-mix application(s) 1130 used to implement the auto-mix method are likely to be stored in storage 1112 and memory 1104 and processed as if the application were typically processed. More or fewer of the components shown in fig. 11 can be included in the computing device 1100. In some embodiments, automated mixing hardware 1120 is included. Although computing device 1100 in fig. 11 includes application 1130 and hardware 1120 for an auto-mix method, the auto-mix method can be implemented on a computing device in hardware, firmware, software, or any combination thereof. For example, in some embodiments, the auto-mix application 1130 is programmed in memory and executed using a processor. In another example, in some embodiments, auto-mix hardware 1120 is programmed hardware logic comprising gates specifically designed to implement an auto-mix method.
In some embodiments, the automated mixing application(s) 1130 include several applications and/or modules. In some embodiments, the module further comprises one or more sub-modules. In some embodiments, fewer or additional modules can be included.
Examples of suitable computing devices include personal computers, laptop computers, computer workstations, servers, mainframe computers, handheld computers, personal digital assistants, cellular/mobile telephones, smart appliances, game consoles, digital cameras, digital camcorders, camera phones, smart phones, portable music players, tablet computers, mobile devices, video players, video disc writers/players (e.g., DVD writers/players, high-definition disc writers/players, ultra-high-definition disc writers/players), televisions, home entertainment systems, augmented reality devices, virtual reality devices, smart jewelry (e.g., smart watches), vehicles (e.g., autopilot vehicles), or any other suitable computing device.
To utilize the automated mixing method described herein, a device such as a digital camera/camcorder/computer is used to obtain the content, which is then analyzed by the same device or one or more additional devices. The auto-mixing method can be implemented with user assistance or automatically without user involvement to perform auto-mixing.
In operation, the auto-mix method provides a more accurate and efficient auto-mix and animation method. The auto-hybrid approach utilizes a photo-video system that facilitates grid tracking to maintain grid registration (topological consistency) and facilitates extreme pose modeling, unlike hand-made shape modeling that facilitates registration but with manual shape generation, and 3D scan-based approaches that facilitate shape generation rather than registration. Topology changes can be tracked through the use of photo-video systems and grid tracking. Thus, each gesture captured will have the same topology, which makes mixing between multiple gestures easier and more accurate.
Some embodiments of automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using an integrated photo-video volume capture system and mesh tracking
1. A method of programming in a non-transitory memory of a device, comprising:
using a volume capture system configured for 3D scanning and 4D scanning comprising capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformations of an actor; and
grid generation is achieved based on 3D scanning and 4D scanning.
2. The method of clause 1, wherein the 3D scanning and the 4D scanning comprise:
3D scan to be used for generating automatic high fidelity extreme poses, and
including enabling mesh tracking to automatically register an extreme pose mesh for mixed high time resolution 4D scanning.
3. The method of clause 2, wherein generating the auto-high-fidelity extreme pose comprises generating the auto-high-fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor.
4. The method of clause 2, wherein the topological correspondence is established across the grid sequence of the 4D scan using the 4D scan and grid tracking to generate a corrected shape for shape interpolation and skeleton-driven deformation.
5. The method of clause 1, further comprising identifying and targeting joints and muscles of the actor by a volume capture system for the 3D scan and the 4D scan.
6. The method of clause 1, wherein grid generation comprises muscle estimation or projection based on 3D and 4D scans and machine learning.
7. The method of clause 1, wherein implementing the mesh generation comprises using the 3D scan and the 4D scan to generate the mesh under extreme gestures including muscle deformation.
8. The method of clause 1, further comprising implementing mesh tracking for tracking topology changes to enable each captured gesture to have the same topology for mixing between gestures.
9. An apparatus, comprising:
a non-transitory memory for storing an application for:
using a volume capture system configured for 3D scanning and 4D scanning comprising capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformations of an actor; and
grid generation is achieved based on 3D scanning and 4D scanning; and
a processor coupled to the memory, the processor configured to process the application.
10. The apparatus of clause 9, wherein the 3D scanning and the 4D scanning comprise:
3D scan to be used for generating automatic high fidelity extreme poses, and
including enabling mesh tracking to automatically register an extreme pose mesh for mixed high time resolution 4D scanning.
11. The apparatus of clause 10, wherein generating the auto-high-fidelity extreme pose comprises generating the auto-high-fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor.
12. The apparatus of clause 10, wherein the 4D scan and mesh tracking are used to establish topological correspondence across the mesh sequence of the 4D scan to generate corrected shapes for shape interpolation and skeleton-driven deformation.
13. The apparatus of clause 9, wherein the application is further configured to identify and target joints and muscles of the actor by a volume capture system for 3D scanning and 4D scanning.
14. The apparatus of clause 9, wherein the grid generation comprises a muscle estimation or projection based on 3D and 4D scans and machine learning.
15. The apparatus of clause 9, wherein implementing the mesh generation comprises using the 3D scan and the 4D scan to generate the mesh under extreme gestures including muscle deformation.
16. The apparatus of clause 9, wherein the application is further configured to implement grid tracking for tracking topology changes to enable each captured gesture to have the same topology for mixing between gestures.
17. A system, comprising:
a volume capture system for 3D scanning and 4D scanning comprising capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformation of an actor; and
a computing device configured to:
receiving captured photographs and video from a volume capture system; and
grid generation is achieved based on 3D scanning and 4D scanning.
18. The system of clause 17, wherein the 3D scanning and the 4D scanning comprise:
3D scan to be used for generating automatic high fidelity extreme poses, and
including enabling mesh tracking to automatically register an extreme pose mesh for mixed high time resolution 4D scanning.
19. The system of clause 18, wherein generating the auto-high fidelity extreme pose comprises generating the auto-high fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor.
20. The system of clause 18, wherein the 4D scan and mesh tracking are used to establish topological correspondence across the mesh sequence of the 4D scan to generate corrected shapes for shape interpolation and skeleton-driven deformation.
21. The system of clause 17, wherein the volume capture system is further configured to identify and target joints and muscles of the actor by the volume capture system for 3D scanning and 4D scanning.
22. The system of clause 17, wherein grid generation includes muscle estimation or projection based on 3D and 4D scans and machine learning.
23. The system of clause 17, wherein implementing the mesh generation comprises using the 3D scan and the 4D scan to generate the mesh under extreme gestures including muscle deformation.
24. The system of clause 17, wherein the volume capture system is further configured to implement mesh tracking for tracking topological changes to enable each captured pose to have the same topology for mixing between poses.
The invention has been described in terms of specific embodiments incorporating details to facilitate the understanding of the principles of construction and operation of the invention. Such references herein to specific embodiments and details thereof are not intended to limit the scope of the claims appended hereto. It will be apparent to those skilled in the art that other various modifications can be made in the embodiment chosen for illustration without departing from the spirit and scope of the invention as defined in the claims.

Claims (24)

1. A method of programming in a non-transitory memory of a device, comprising:
using a volume capture system configured for 3D scanning and 4D scanning comprising capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformations of an actor; and
grid generation is achieved based on 3D scanning and 4D scanning.
2. The method of claim 1, wherein 3D scanning and 4D scanning comprise:
3D scan to be used for generating automatic high fidelity extreme poses, and
including enabling mesh tracking to automatically register an extreme pose mesh for mixed high time resolution 4D scanning.
3. The method of claim 2, wherein generating the auto-high-fidelity extreme pose comprises generating the auto-high-fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor.
4. The method of claim 2, wherein 4D scanning and mesh tracking are used to establish topological correspondence across a mesh sequence of 4D scanning to generate corrected shapes for shape interpolation and skeleton-driven deformation.
5. The method of claim 1, further comprising identifying and targeting joints and muscles of the actor by a volume capture system for 3D scanning and 4D scanning.
6. The method of claim 1, wherein grid generation comprises muscle estimation or projection based on 3D and 4D scans and machine learning.
7. The method of claim 1, wherein implementing mesh generation comprises using 3D scanning and 4D scanning to generate a mesh under extreme gestures including muscle deformation.
8. The method of claim 1, further comprising implementing mesh tracking for tracking topology changes to enable each captured gesture to have the same topology for mixing between gestures.
9. An apparatus, comprising:
a non-transitory memory for storing an application for:
using a volume capture system configured for 3D scanning and 4D scanning comprising capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformations of an actor; and
grid generation is achieved based on 3D scanning and 4D scanning; and
a processor coupled to the memory, the processor configured to process the application.
10. The apparatus of claim 9, wherein 3D scanning and 4D scanning comprise:
3D scan to be used for generating automatic high fidelity extreme poses, and
including enabling mesh tracking to automatically register an extreme pose mesh for mixed high time resolution 4D scanning.
11. The apparatus of claim 10, wherein generating the auto-high-fidelity extreme pose comprises generating the auto-high-fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor.
12. The apparatus of claim 10, wherein 4D scan and mesh tracking are used to establish topological correspondence across a mesh sequence of 4D scans to generate corrected shapes for shape interpolation and skeleton-driven deformation.
13. The apparatus of claim 9, wherein the application is further configured to identify and target joints and muscles of the actor by a volume capture system for 3D scanning and 4D scanning.
14. The apparatus of claim 9, wherein grid generation comprises muscle estimation or projection based on 3D and 4D scans and machine learning.
15. The apparatus of claim 9, wherein implementing mesh generation comprises using 3D scanning and 4D scanning to generate a mesh under extreme gestures including muscle deformation.
16. The apparatus of claim 9, wherein the application is further configured to implement mesh tracking for tracking topology changes to enable each captured gesture to have the same topology for mixing between gestures.
17. A system, comprising:
a volume capture system for 3D scanning and 4D scanning comprising capturing photographs and video simultaneously, wherein the 3D scanning and 4D scanning comprise detecting muscle deformation of an actor; and
a computing device configured to:
receiving captured photographs and video from a volume capture system; and
grid generation is achieved based on 3D scanning and 4D scanning.
18. The system of claim 17, wherein 3D scanning and 4D scanning comprise:
3D scan to be used for generating automatic high fidelity extreme poses, and
including enabling mesh tracking to automatically register an extreme pose mesh for mixed high time resolution 4D scanning.
19. The system of claim 18, wherein generating the auto-high-fidelity extreme pose comprises generating the auto-high-fidelity extreme pose using a 3D scan of the actor and muscle deformation of the actor.
20. The system of claim 18, wherein 4D scan and mesh tracking are used to establish topological correspondence across a mesh sequence of 4D scans to generate corrected shapes for shape interpolation and skeleton-driven deformation.
21. The system of claim 17, wherein the volume capture system is further configured to identify and target joints and muscles of the actor by the volume capture system for 3D scanning and 4D scanning.
22. The system of claim 17, wherein grid generation comprises muscle estimation or projection based on 3D and 4D scans and machine learning.
23. The system of claim 17, wherein implementing mesh generation comprises using 3D scanning and 4D scanning to generate a mesh under extreme gestures including muscle deformation.
24. The system of claim 17, wherein the volume capture system is further configured to implement mesh tracking for tracking topological changes to enable each captured pose to have the same topology for mixing between poses.
CN202280007210.6A 2021-04-01 2022-03-31 Automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using integrated photo-video volume capture system and mesh tracking Pending CN116529766A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/169,323 2021-04-01
US17/706,996 2022-03-29
US17/706,996 US20220319114A1 (en) 2021-04-01 2022-03-29 Automatic blending of human facial expression and full-body poses for dynamic digital human model creation using integrated photo-video volumetric capture system and mesh-tracking
PCT/IB2022/053036 WO2022208442A1 (en) 2021-04-01 2022-03-31 Automatic blending of human facial expression and full-body poses for dynamic digital human model creation using integrated photo-video volumetric capture system and mesh-tracking

Publications (1)

Publication Number Publication Date
CN116529766A true CN116529766A (en) 2023-08-01

Family

ID=87406820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280007210.6A Pending CN116529766A (en) 2021-04-01 2022-03-31 Automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using integrated photo-video volume capture system and mesh tracking

Country Status (1)

Country Link
CN (1) CN116529766A (en)

Similar Documents

Publication Publication Date Title
CN109636831B (en) Method for estimating three-dimensional human body posture and hand information
Sharp et al. Accurate, robust, and flexible real-time hand tracking
Wei et al. Videomocap: Modeling physically realistic human motion from monocular video sequences
US9818217B2 (en) Data driven design and animation of animatronics
Rambach et al. [poster] augmented things: Enhancing ar applications leveraging the internet of things and universal 3d object tracking
JP2019096113A (en) Processing device, method and program relating to keypoint data
CN110147737B (en) Method, apparatus, device and storage medium for generating video
US11640687B2 (en) Volumetric capture and mesh-tracking based machine learning 4D face/body deformation training
US20230419583A1 (en) Methods and systems for markerless facial motion capture
CN110853131A (en) Virtual video data generation method for behavior recognition
Fan et al. HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Chatzitofis et al. A low-cost & real-time motion capture system
CN116529766A (en) Automatic mixing of human facial expressions and whole-body gestures for dynamic digital mannequin creation using integrated photo-video volume capture system and mesh tracking
US20220319114A1 (en) Automatic blending of human facial expression and full-body poses for dynamic digital human model creation using integrated photo-video volumetric capture system and mesh-tracking
KR20230116902A (en) Automatic blending of human facial expressions and full body poses to create a dynamic digital human model using integrated photo-video volumetric capture system and mesh-tracking
El-Sallam et al. Towards a Fully Automatic Markerless Motion Analysis System for the Estimation of Body Joint Kinematics with Application to Sport Analysis.
HanbyulJoo et al. Panoptic studio: A massively multiview system for social interaction capture
Wang et al. Markerless body motion capturing for 3d character animation based on multi-view cameras
Yang et al. Synthetic hands generator for RGB hand tracking
Jian et al. Realistic face animation generation from videos
Albakri et al. 3D Keyframe Motion Extraction from Zapin Traditional Dance Videos
Robertini et al. Capture of arm-muscle deformations using a depth-camera
Ascenso Development of a non-invasive motion capture system for swimming biomechanics
Magnor et al. Model-based analysis of multi-video data
Venkatrayappa et al. Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination