CN114998520A - Three-dimensional interactive hand reconstruction method and system based on implicit expression - Google Patents
Three-dimensional interactive hand reconstruction method and system based on implicit expression Download PDFInfo
- Publication number
- CN114998520A CN114998520A CN202210619894.7A CN202210619894A CN114998520A CN 114998520 A CN114998520 A CN 114998520A CN 202210619894 A CN202210619894 A CN 202210619894A CN 114998520 A CN114998520 A CN 114998520A
- Authority
- CN
- China
- Prior art keywords
- implicit
- query
- reconstruction
- region
- global
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Graphics (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a three-dimensional interactive hand reconstruction method and a system based on implicit expression, which comprises the following steps: 1. constructing a feature extraction neural network, acquiring global and regional features from a single input color image, and further acquiring examples and joint features according to the input image and the regional features; 2. selecting query points according to a query strategy by using the acquired regional characteristics, instance characteristics and joint characteristics, and constructing query conditions; 3. constructing a parameterized implicit neural network, and performing implicit reconstruction based on query points and query conditions; 4. carrying out physical optimization on the reconstructed model, punishing unreasonable penetration, and adjusting and updating the reconstructed model; 5. and (4) iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as the final reconstruction result of the interactive hand. According to the method, only a single color image is needed, hands with any chirality and number in the image can be reconstructed, an end-to-end modeling mode is achieved, and the quality of three-dimensional gesture and shape reconstruction is improved.
Description
Technical Field
The invention relates to the field of computer vision and computer graphics, in particular to a three-dimensional interactive hand reconstruction method and system based on implicit expression.
Background
In recent years, three-dimensional gesture and shape reconstruction technologies become a hot research direction, and have wide application prospects in the fields of virtual reality, robot control, motion sensing games and the like. In the actual daily life, the interactive hands are everywhere, and play an important role in information exchange between people, such as single-person emotional expression, multi-person collaboration and the like. As with human activities in real life, interactions between hands in virtual reality are also important content. Therefore, it is not necessary to study the reconstruction of the interactive hand. Due to the interaction, the interacting hands are often heavily occluded, in close contact with each other and have similar textural features as compared to a single hand. In addition, the solution space freedom of the interactive hand is also greater.
Early methods tended to take additional depth information or more perspectives as input. However, they always fit the input data to a specific hand template without personalized changes and cannot adapt to existing learning-based methods. Due to the ubiquitous nature of monocular color images, methods using monocular color images are preferred over methods using multiple cameras and depth cameras. Construction of the interbanded 2.6m dataset brought data support to this problem and subsequently prompted the emergence of a series of methods to reconstruct interacting hands from monocular color images. However, these interactive hand reconstruction efforts still suffer from several problems:
1) they typically rely on the assumption that there is only one left hand and one right hand in the image, re-assigning the problem as a regression of the MANO parameters of both hands. This limits the number of handedness and hands in the image. 2) The reconstructed interactive hand has the physical unreasonable problems of space entanglement, mutual infiltration and the like, so that the reconstruction quality is not high.
Disclosure of Invention
The invention aims to solve the technical problem that the shape and the posture of a hand which are physically reasonable can be reconstructed from a single color image without limiting the chirality and the number of hands.
In order to solve the technical problem, the invention provides a three-dimensional interactive hand reconstruction method based on implicit surface representation. The method constructs a parameterized implicit neural network, and can classify the 3D query points in the standard space of the interactive hand by taking the image characteristics as query strategies and conditions. According to the image clues, a new query strategy and query conditions are designed, so that the query efficiency is improved, and the query space is reduced. In addition, an effective physical optimization scheme is provided to solve the problem of mutual permeation between instances under implicit expression, and the physical authenticity of a reconstruction result is effectively ensured.
The three-dimensional interactive hand reconstruction method based on the implicit expression provided by the invention comprises the following steps:
step 1, building a feature extraction neural network, acquiring global features from a single input color image, dividing the global features into a plurality of regional features based on a connected part, and further acquiring instance features and joint features according to the input image and the regional features;
step 2, selecting query points according to a query strategy by using the global features, the regional features and the joint features obtained in the step 1, and constructing query conditions;
step 3, building a parameterized implicit neural network, and performing implicit reconstruction based on the query points and the query conditions;
step 4, carrying out physical optimization on the reconstructed model, punishing unreasonable penetration, and adjusting and updating the reconstructed model;
and 5, iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as the final reconstruction result of the interactive hand.
Further, step 1 adopts two encoder-decoder neural networks with ResNet18 as a backbone, and the acquired features are supervised in the network training process.
Further, the training may constrain the features by the following loss function:
L feature =L Global +L instance +L joint
first term L of the loss function Global Representing the constraint on the Global mask and Global Z-map, the second term L of the loss function instance Showing the third term L of the loss function, constraining the example mask joint Showing the supervision of the localizer map and the Z-map of the joint. L is Global 、L instance And L joint Are defined as follows:
L Global in a loss functionAndnetwork predictors for the global mask and the global Z-map respectively,andis the corresponding true value; l is instance In a loss functionAndrespectively representing the network predicted value and the truth value of the kth example mask in the area i; l is joint In a loss functionAndrespectively a localization map of the joint of the zone i and a network prediction value of the Z-map,is andis the corresponding true value.
Further, in step 1, for an input color image I containing an interactive hand, a first encoder-decoder network is first used to obtain a global mask G M And Global Z-mapG Z . Then, the bounding box coordinates for each region are used to crop I, G M And G Z To obtain region information, i.e. region imagesArea maskRegion(s)Then, according toAndparallel extraction of the scout map of the visible joints for each region i using a second encoder-decoder networkAndand mask E of the visible example (k) . k represents the number of example masks visible in region i.
Further, in the step 2, for each regionConstruction of a sampling space d according to a query strategy (i) And samples points in the sample space. All the sampling points are combined to be used as query points, and the query points are normalized to be a unit cube. In addition, for each region, region embedding r is obtained from the step 1 feature extraction (i) Each instance E in the region (k) Sending to a multi-layer perceptron to encode to obtain an instance embedding e (k) . Will r is (i) And e (k) As a query condition for implicit reconstruction.
Further, in the step 3, based on the query point and the query condition, the parameterized implicit neural network is used for implicit reconstruction. The parameterized implicit neural network can be viewed as an implicit function, with the formula:
wherein, p is a query point, r and e are region embedding and instance embedding respectively, and are taken as query conditions, and τ represents output to obtain an occupancy value.
The neural network is supervised using cross entropy losses for predicted occupancy and real occupancy. Furthermore, we define a collision loss to penalize the collision of the hand surface. Considering that a query point belongs to at most one single-handed instance, the penetration loss is defined as:
where Ω represents all query points and k is equal to the number of hands in each region.
Further, in step 4, after the hand surface is implicitly reconstructed, the physical rationality of the reconstructed interactive hand is checked, whether the reconstructed interactive hand model has penetration or not is judged, and if the reconstructed interactive hand model has penetration, the reconstructed model is optimized by using a physical optimization method.
Further, in step 5, considering that the hand is not rigid, penetration within 2mm is allowed. And (5) iteratively and optimally reconstructing the model according to the step (4) until the depth d is less than 2 mm.
Further, the system comprises a feature extraction module, an implicit reconstruction module and a physical optimization module. The feature extraction module comprises two units: global and regional feature extraction unit for obtaining global mask G from single input image M Global Z-mapG Z Region imageArea maskAnd areaAn example and joint feature estimation unit for extracting a positioning map of the visible joint for each region i in parallel according to the region featuresAndand visible example mask E (k) . The implicit reconstruction module includes two units: a query point and query condition acquisition unit for each regionSelecting query points according to a query strategy, and acquiring region embedding r from a feature extraction module (i) Masking E from the instance of the area by a multi-layer perceptron (k) Intermediate coding results in instance embedding { e } (k) H, will r (i) And e (k) As a query condition for implicit reconstruction; and the implicit curved surface reconstruction unit acquires the occupancy rate value of each query point through a parameterized implicit neural network and further extracts a reconstructed curved surface. And the physical optimization module is used for judging whether the reconstructed model has penetration, if the reconstructed model has penetration, the reconstructed model is optimized by using a physical optimization method, the reconstructed model is iteratively optimized until the maximum penetration depth is less than 2mm, and the optimization result is used as the final reconstruction result of the interactive hand.
Compared with the prior art, the invention has the following advantages:
the invention provides a three-dimensional interactive hand reconstruction method based on implicit expression, which can reconstruct hands with any chirality and quantity in an image only by a single color image, realizes an end-to-end modeling mode, and improves the quality of three-dimensional gesture and shape reconstruction.
In addition, according to the image clue, a new query strategy and query conditions are designed, so that the query efficiency is improved, and the query space is reduced.
In addition, aiming at the unreasonable reconstruction result in physics, an effective physical optimization scheme is provided to solve the problem of mutual permeation between examples under implicit expression, and the physical authenticity of the reconstruction result is effectively ensured.
Drawings
FIG. 1 is a flow chart of a three-dimensional interactive hand reconstruction method in an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a three-dimensional interactive hand reconstruction system in an embodiment of the present invention;
FIG. 3 is a diagram of a physical optimization scheme in an embodiment of the present invention, wherein (a) the diagram is a force calculation for each pixel, and (b) the diagram is a force calculation for each example.
Fig. 4 is a diagram of reconstruction effect that can be achieved by the present invention, wherein (a) the histogram is an input image, (b) the histogram is a real mask, (c) the histogram is a predicted mask, (d) the histogram is an implicit reconstruction result, and (e) the two histograms are results of two optimized views.
Detailed Description
The following detailed description of the embodiments of the present invention will be made with reference to the accompanying drawings and examples, so as to enable a user to clearly understand the present invention by combing the methods and effects of the present invention. It should be noted that, in the case of no conflict, the features of the embodiments of the present invention may be combined with each other, and the formed technical solutions are within the scope of the present invention.
Further, the flowcharts shown in the drawings may be executed in a series of sequential instructions in a computer, and the flow order may be appropriately modified in some cases.
Example one
Fig. 1 is a flowchart of a three-dimensional interactive hand reconstruction method based on implicit representation according to a first embodiment of the present invention, and each step is described in detail below with reference to fig. 1.
And step S110, building a characteristic extraction neural network, and acquiring corresponding characteristics from the input color image.
In order to obtain corresponding features from a single color image, a feature extraction network is built. As shown in fig. 2(S1), the feature extraction network consists of two encoder-decoder neural networks with ResNet18 as the backbone. For an input color image I containing interactive hands, a global mask G is first obtained using a first encoder-decoder network M And Global Z-mapG Z . Then, the bounding box coordinates for each region are used to crop I, G M And G Z To obtain region information, i.e. region imagesArea maskRegion(s)Then, according toAndusing a second encoder-decoder network, a localization map of visible joints is extracted in parallel for each region iAndand visible example mask E (k) . Where k represents the number of visible example masks in region i.
To achieve the above functionality, the feature extraction network is pre-trained using the following loss function:
L feature =L Global +L instance +L joint
first term L of the loss function Global Indicating that the global mask and the global Z-map are supervised, the second term L of the loss function instance Shows the third term L of the loss function for supervising the example mask joint Showing the supervision of the joint positioning map and the Z-map. L is a radical of an alcohol Global 、L instance And L joint Are defined as follows:
L Global in a loss functionAndnetwork predictors for the global mask and the global Z-map respectively,andis the corresponding true value; l is a radical of an alcohol instance In a loss functionAndrespectively representing the network predicted value and the truth value of the kth example mask in the area i; l is joint In a loss functionAndrespectively a positioning map of the joint of the area i and a network predicted value of the Z-map,is andis the corresponding true value.
And step S120, selecting query points according to the query strategy by using the acquired characteristics, and constructing query conditions.
The specific process of query point selection is as follows: for each regionFirst, find the corresponding query silhouetteAnd maximum query depthThen, useAs the area of the bottom portion,as height, a sampling space d is constructed (i) . Samples are sampled uniformly in this space. In addition, each visible joint is considered an anchor. Another number of sampling points are selected near each anchor. Specifically, a Gaussian mixture distribution (x) is used (j) σ), where each visible joint coordinate x (j) As a center, σ is the variance of each dimension. In the experiments of the present invention, σ was set to 2.5 cm. All the sampling points are combined to be used as query points, and the query points are normalized to be a unit cube. The specific process of constructing the query condition is as follows: for each regionObtaining region embedding r from feature extraction of FIG. 2(S1) (i) . For each instance E in the region (k) Sending to a multi-layer perceptron to encode to obtain an instance embedding e (k) . Will r is (i) And e (k) As a query condition for implicit reconstruction.
And S130, building a parameterized implicit neural network, and performing implicit reconstruction based on the query points and the query conditions.
As shown in FIG. 2(S2), a parameterized neural network is established based on the query points and the query conditionsIt is treated as an implicit function, and the formula is as follows:
wherein, p is a query point, r and e are region embedding and instance embedding respectively, and are taken as query conditions, and τ represents output to obtain an occupancy value.
In the experimental phase of the present invention, the surface of the hand is implicitly represented by an isosurface τ -0.5. If the occupancy rate of the query point is less than 0.5, the point is outside the surface, and if the occupancy rate is greater than 0.5, the point is inside the surface.
We use cross entropy loss of predicted occupancy and real occupancy to supervise the neural network. Furthermore, we define a collision loss to penalize the collision of the hand surface. Considering that a query point belongs to at most one-handed instance, the penetration loss is defined as:
where Ω represents all query points and k is equal to the number of hands in each region.
The parameterized implicit neural network consists of three convolutional layers and nine fully-connected layers. Training was performed using the SGD optimizer, 64 per batch. The query point is not fixed and is resampled in each iteration.
And step S140, carrying out physical rationality check on the reconstructed interaction hand, optimizing and updating the reconstructed model.
After the hand surface is obtained through implicit reconstruction, physical reasonability check is carried out on the reconstructed interactive hand, and unreasonable penetration is punished so as to better align real gestures. As shown in FIG. 3, the penetration depth is first calculated and the from-region mask is projected in the positive Z directionThe observed rays should continue through one hand and the number of intersections should be even. Assume that there is a ray that passes through the implicitly reconstructed regions of two interacting hands a and B. Along this ray, all intersections are recorded asWhere N represents the number of intersections. For the intersection of the hand A surfacesIf it is notAndnot belonging to the same hand surface, this indicates penetration at that location. In this case, the next intersection point is found on the surface of hand ADefining the penetration depth as the intersectionAndthe distance between them. Then, the force of each pixel is calculated. In the optimization stage, it is desirable that the hand pose be adjusted to any orientation, not parallel to the light. Considering that the force is directional, the magnitude of the force is directly related to the depth of penetration, and the position where the penetration occurs can act like a spring to generate a repulsive force, in which case the handle is considered as a rigid body. The relationship between penetration depth and corresponding repulsion force is defined as:
wherein f is (u,v) Denotes the force generated by the light emitted along the pixel (u, v) due to penetration, t denotes the number of penetrations, λ is the associated weight, d t Indicating the t-th penetration depth.
By adding the repulsive forces of all the light rays, the direction of the resultant force F may deviate from the z-axis direction due to the difference in the repulsive force of each light ray. The formula for the resultant force is as follows:
from the resultant force F, the average penetration depth is calculatedThen, for each hand, move in a direction parallel to FTo penalize extreme adjustments, projection losses are used, in which orthogonal projections are used.
Wherein H pose Represents an optimized interactive hand, pi (H) pose ) Is a 2D projection thereof, R M Representing the region mask estimated by the encoder-decoder.
And S150, iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as a final reconstruction result.
Allowing penetration within 2mm considering the hand to be non-rigid. And (5) iteratively optimizing the reconstruction model according to the step (S140) until the depth d is less than 2 mm.
In the first embodiment, the input color image may include any chirality and number of hands, and the experimental result is shown in fig. 4. The first column of fig. 4 represents the input color image, the second and third columns represent the real mask and the predicted mask, respectively, the fourth column represents the implicit reconstruction result, and the fifth column represents the optimized reconstruction result.
Those skilled in the art will appreciate that the modules or steps of the invention described above can be implemented in a general purpose computing device, centralized on a single computing device or distributed across a network of computing devices, and optionally implemented in program code that is executable by a computing device, such that the modules or steps are stored in a memory device and executed by a computing device, fabricated separately into integrated circuit modules, or fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Claims (9)
1. A three-dimensional interactive hand reconstruction method based on implicit expression is characterized by comprising the following steps: the method comprises the following steps:
step 1, building a feature extraction neural network, acquiring global features from a single input color image, dividing the global features into a plurality of regional features based on a connected part, and further acquiring instance features and joint features according to the input image and the regional features;
step 2, selecting query points according to a query strategy by using the regional characteristics, the example characteristics and the joint characteristics acquired in the step 1, and constructing query conditions;
step 3, building a parameterized implicit neural network, and performing implicit reconstruction based on the query points and the query conditions;
step 4, carrying out physical optimization on the reconstructed model, punishing unreasonable penetration, and adjusting and updating the reconstructed model;
and 5, iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as the final reconstruction result of the interactive hand.
2. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 1, wherein: step 1 adopts two encoder-decoder neural networks using ResNet18 as a backbone, and supervises the acquired features in the network training process.
3. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 2, wherein: the training constrains the features by the following loss function:
L feature =L Global +L instance +L joint
first term L of the loss function Global Indicating that the global mask and global Z-map are constrained,second term L of loss function instance Showing the third term L of the loss function, constraining the example mask joint Showing the supervision of the positioning chart and the Z-map of the joint; l is Global 、L instance And L joint Are defined as follows:
L Global in a loss functionAndnetwork predictors for the global mask and the global Z-map respectively,andis the corresponding true value; l is instance In a loss functionAndrespectively representing the network predicted value and the truth value of the kth example mask in the area i; l is joint In a loss functionAndrespectively a localization map of the joint of the zone i and a network prediction value of the Z-map,is andis the corresponding true value.
4. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 1, wherein: in step 1, for an input color image I containing an interactive hand, a first encoder-decoder network is used to obtain a global mask G M And Global Z-mapG Z (ii) a Then, the bounding box coordinates for each region are used to crop I, G M And G Z To obtain region information, i.e. region imagesArea maskRegion(s)Then, according toAndparallel extraction of the scout map of the visible joints for each region i using a second encoder-decoder networkAndand mask E of the visible example (k) (ii) a k represents the number of example masks visible in region i.
5. The implicit representation based three-dimensional interactive hand reconstruction method of claim 1, characterized in that: in the step 2, for each regionConstruction of a sampling space d according to a query strategy (i) Sampling points in the sampling space; combining all sampling points to serve as query points, and normalizing the query points into a unit cube; in addition, for each region, region embedding r is obtained from the step 1 feature extraction (i) Each instance E in the region (k) Sending to a multi-layer perceptron to encode to obtain an instance embedding e (k) (ii) a Will r is (i) And e (k) As a query condition for implicit reconstruction.
6. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 5, wherein: in the step 3, based on the query point and the query condition, performing implicit reconstruction by using a parameterized implicit neural network; the parameterized implicit neural network is treated as an implicit function, and the formula is as follows:
wherein, p is a query point, r and e are region embedding and instance embedding respectively and are taken as query conditions, and tau represents an occupancy rate value obtained by output;
supervising the neural network using cross entropy losses for the predicted occupancy and the true occupancy; in addition, a collision loss is defined to penalize the collision of the hand surface; considering that a query point belongs to at most one single-handed instance, the penetration loss is defined as:
where Ω represents all query points and k is equal to the number of hands in each region.
7. The implicit representation based three-dimensional interactive hand reconstruction method of claim 1, characterized in that: and 4, after the hand surface is obtained through implicit reconstruction, carrying out physical rationality check on the reconstructed interactive hand, judging whether the reconstructed interactive hand model has penetration, and if the reconstructed interactive hand model has penetration, optimizing the reconstructed model by using a physical optimization method.
8. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 1, wherein: in said step 5, considering that the hand is not rigid, allowing penetration within 2 mm; and (5) iteratively and optimally reconstructing the model according to the step (4) until the depth d is less than 2 mm.
9. A three-dimensional interactive hand reconstruction system based on implicit expression is characterized in that: the system comprises a feature extraction module, an implicit reconstruction module and a physical optimization module;
the feature extraction module comprises two units: global and regional feature extraction unit for obtaining global mask G from single input image M Global Z-mapG Z Region imageArea maskAnd zone Z-An example and joint feature estimation unit for extracting a positioning map of the visible joint for each region i in parallel according to the region featuresAndand visible example mask E (k) ;
The implicit reconstruction module comprises two units: a query point and query condition acquisition unit for each regionSelecting query points according to a query strategy, and acquiring region embedding r from a feature extraction module (i) Masking E from the instance of the area by a multi-layer perceptron (k) Intermediate coding results in instance embedding { e } (k) H, will r (i) And e (k) As a query condition for implicit reconstruction; the implicit curved surface reconstruction unit is used for acquiring the occupancy rate value of each query point through a parameterized implicit neural network and further extracting a reconstructed curved surface;
and the physical optimization module judges whether the reconstructed model has penetration, if the reconstructed model has penetration, the reconstructed model is optimized by using a physical optimization method, the reconstructed model is iteratively optimized until the maximum penetration depth is less than 2mm, and the optimization result is used as the final reconstruction result of the interactive hand.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210619894.7A CN114998520A (en) | 2022-06-02 | 2022-06-02 | Three-dimensional interactive hand reconstruction method and system based on implicit expression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210619894.7A CN114998520A (en) | 2022-06-02 | 2022-06-02 | Three-dimensional interactive hand reconstruction method and system based on implicit expression |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114998520A true CN114998520A (en) | 2022-09-02 |
Family
ID=83030222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210619894.7A Pending CN114998520A (en) | 2022-06-02 | 2022-06-02 | Three-dimensional interactive hand reconstruction method and system based on implicit expression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998520A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740290A (en) * | 2023-08-15 | 2023-09-12 | 江西农业大学 | Three-dimensional interaction double-hand reconstruction method and system based on deformable attention |
-
2022
- 2022-06-02 CN CN202210619894.7A patent/CN114998520A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740290A (en) * | 2023-08-15 | 2023-09-12 | 江西农业大学 | Three-dimensional interaction double-hand reconstruction method and system based on deformable attention |
CN116740290B (en) * | 2023-08-15 | 2023-11-07 | 江西农业大学 | Three-dimensional interaction double-hand reconstruction method and system based on deformable attention |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106327571B (en) | A kind of three-dimensional face modeling method and device | |
CN109410307A (en) | A kind of scene point cloud semantic segmentation method | |
CN110419049A (en) | Room layout estimation method and technology | |
CN108921926A (en) | A kind of end-to-end three-dimensional facial reconstruction method based on single image | |
CN111783582A (en) | Unsupervised monocular depth estimation algorithm based on deep learning | |
CN110135249A (en) | Human bodys' response method based on time attention mechanism and LSTM | |
CN112308918B (en) | Non-supervision monocular vision odometer method based on pose decoupling estimation | |
CN113393550B (en) | Fashion garment design synthesis method guided by postures and textures | |
CN110335344A (en) | Three-dimensional rebuilding method based on 2D-3D attention mechanism neural network model | |
CN108846343B (en) | Multi-task collaborative analysis method based on three-dimensional video | |
CN109726619A (en) | A kind of convolutional neural networks face identification method and system based on parameter sharing | |
CN111462274A (en) | Human body image synthesis method and system based on SMP L model | |
CN109657634A (en) | A kind of 3D gesture identification method and system based on depth convolutional neural networks | |
CN112819951A (en) | Three-dimensional human body reconstruction method with shielding function based on depth map restoration | |
CN113158861A (en) | Motion analysis method based on prototype comparison learning | |
CN117218246A (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN114998520A (en) | Three-dimensional interactive hand reconstruction method and system based on implicit expression | |
Kang et al. | 3D human pose lifting with grid convolution | |
CN117115911A (en) | Hypergraph learning action recognition system based on attention mechanism | |
CN108961384A (en) | three-dimensional image reconstruction method | |
CN114202606A (en) | Image processing method, electronic device, storage medium, and computer program product | |
Bao et al. | Pose ResNet: a 3D human pose estimation network model | |
CN111311648A (en) | Method for tracking human hand-object interaction process based on collaborative differential evolution filtering | |
Wang et al. | Scene recognition based on DNN and game theory with its applications in human-robot interaction | |
CN115170817B (en) | Character interaction detection method based on three-dimensional human-object grid topology enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |