CN114998520A - Three-dimensional interactive hand reconstruction method and system based on implicit expression - Google Patents

Three-dimensional interactive hand reconstruction method and system based on implicit expression Download PDF

Info

Publication number
CN114998520A
CN114998520A CN202210619894.7A CN202210619894A CN114998520A CN 114998520 A CN114998520 A CN 114998520A CN 202210619894 A CN202210619894 A CN 202210619894A CN 114998520 A CN114998520 A CN 114998520A
Authority
CN
China
Prior art keywords
implicit
query
reconstruction
region
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210619894.7A
Other languages
Chinese (zh)
Inventor
王雁刚
谢薇
赵子萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210619894.7A priority Critical patent/CN114998520A/en
Publication of CN114998520A publication Critical patent/CN114998520A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Graphics (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a three-dimensional interactive hand reconstruction method and a system based on implicit expression, which comprises the following steps: 1. constructing a feature extraction neural network, acquiring global and regional features from a single input color image, and further acquiring examples and joint features according to the input image and the regional features; 2. selecting query points according to a query strategy by using the acquired regional characteristics, instance characteristics and joint characteristics, and constructing query conditions; 3. constructing a parameterized implicit neural network, and performing implicit reconstruction based on query points and query conditions; 4. carrying out physical optimization on the reconstructed model, punishing unreasonable penetration, and adjusting and updating the reconstructed model; 5. and (4) iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as the final reconstruction result of the interactive hand. According to the method, only a single color image is needed, hands with any chirality and number in the image can be reconstructed, an end-to-end modeling mode is achieved, and the quality of three-dimensional gesture and shape reconstruction is improved.

Description

Three-dimensional interactive hand reconstruction method and system based on implicit expression
Technical Field
The invention relates to the field of computer vision and computer graphics, in particular to a three-dimensional interactive hand reconstruction method and system based on implicit expression.
Background
In recent years, three-dimensional gesture and shape reconstruction technologies become a hot research direction, and have wide application prospects in the fields of virtual reality, robot control, motion sensing games and the like. In the actual daily life, the interactive hands are everywhere, and play an important role in information exchange between people, such as single-person emotional expression, multi-person collaboration and the like. As with human activities in real life, interactions between hands in virtual reality are also important content. Therefore, it is not necessary to study the reconstruction of the interactive hand. Due to the interaction, the interacting hands are often heavily occluded, in close contact with each other and have similar textural features as compared to a single hand. In addition, the solution space freedom of the interactive hand is also greater.
Early methods tended to take additional depth information or more perspectives as input. However, they always fit the input data to a specific hand template without personalized changes and cannot adapt to existing learning-based methods. Due to the ubiquitous nature of monocular color images, methods using monocular color images are preferred over methods using multiple cameras and depth cameras. Construction of the interbanded 2.6m dataset brought data support to this problem and subsequently prompted the emergence of a series of methods to reconstruct interacting hands from monocular color images. However, these interactive hand reconstruction efforts still suffer from several problems:
1) they typically rely on the assumption that there is only one left hand and one right hand in the image, re-assigning the problem as a regression of the MANO parameters of both hands. This limits the number of handedness and hands in the image. 2) The reconstructed interactive hand has the physical unreasonable problems of space entanglement, mutual infiltration and the like, so that the reconstruction quality is not high.
Disclosure of Invention
The invention aims to solve the technical problem that the shape and the posture of a hand which are physically reasonable can be reconstructed from a single color image without limiting the chirality and the number of hands.
In order to solve the technical problem, the invention provides a three-dimensional interactive hand reconstruction method based on implicit surface representation. The method constructs a parameterized implicit neural network, and can classify the 3D query points in the standard space of the interactive hand by taking the image characteristics as query strategies and conditions. According to the image clues, a new query strategy and query conditions are designed, so that the query efficiency is improved, and the query space is reduced. In addition, an effective physical optimization scheme is provided to solve the problem of mutual permeation between instances under implicit expression, and the physical authenticity of a reconstruction result is effectively ensured.
The three-dimensional interactive hand reconstruction method based on the implicit expression provided by the invention comprises the following steps:
step 1, building a feature extraction neural network, acquiring global features from a single input color image, dividing the global features into a plurality of regional features based on a connected part, and further acquiring instance features and joint features according to the input image and the regional features;
step 2, selecting query points according to a query strategy by using the global features, the regional features and the joint features obtained in the step 1, and constructing query conditions;
step 3, building a parameterized implicit neural network, and performing implicit reconstruction based on the query points and the query conditions;
step 4, carrying out physical optimization on the reconstructed model, punishing unreasonable penetration, and adjusting and updating the reconstructed model;
and 5, iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as the final reconstruction result of the interactive hand.
Further, step 1 adopts two encoder-decoder neural networks with ResNet18 as a backbone, and the acquired features are supervised in the network training process.
Further, the training may constrain the features by the following loss function:
L feature =L Global +L instance +L joint
first term L of the loss function Global Representing the constraint on the Global mask and Global Z-map, the second term L of the loss function instance Showing the third term L of the loss function, constraining the example mask joint Showing the supervision of the localizer map and the Z-map of the joint. L is Global 、L instance And L joint Are defined as follows:
Figure BDA0003675986380000031
Figure BDA0003675986380000032
Figure BDA0003675986380000033
L Global in a loss function
Figure BDA0003675986380000034
And
Figure BDA0003675986380000035
network predictors for the global mask and the global Z-map respectively,
Figure BDA0003675986380000036
and
Figure BDA0003675986380000037
is the corresponding true value; l is instance In a loss function
Figure BDA0003675986380000038
And
Figure BDA0003675986380000039
respectively representing the network predicted value and the truth value of the kth example mask in the area i; l is joint In a loss function
Figure BDA00036759863800000310
And
Figure BDA00036759863800000311
respectively a localization map of the joint of the zone i and a network prediction value of the Z-map,
Figure BDA00036759863800000312
is and
Figure BDA00036759863800000313
is the corresponding true value.
Further, in step 1, for an input color image I containing an interactive hand, a first encoder-decoder network is first used to obtain a global mask G M And Global Z-mapG Z . Then, the bounding box coordinates for each region are used to crop I, G M And G Z To obtain region information, i.e. region images
Figure BDA00036759863800000314
Area mask
Figure BDA00036759863800000315
Region(s)
Figure BDA00036759863800000316
Then, according to
Figure BDA00036759863800000317
And
Figure BDA00036759863800000318
parallel extraction of the scout map of the visible joints for each region i using a second encoder-decoder network
Figure BDA00036759863800000319
And
Figure BDA00036759863800000320
and mask E of the visible example (k) . k represents the number of example masks visible in region i.
Further, in the step 2, for each region
Figure BDA00036759863800000321
Construction of a sampling space d according to a query strategy (i) And samples points in the sample space. All the sampling points are combined to be used as query points, and the query points are normalized to be a unit cube. In addition, for each region, region embedding r is obtained from the step 1 feature extraction (i) Each instance E in the region (k) Sending to a multi-layer perceptron to encode to obtain an instance embedding e (k) . Will r is (i) And e (k) As a query condition for implicit reconstruction.
Further, in the step 3, based on the query point and the query condition, the parameterized implicit neural network is used for implicit reconstruction. The parameterized implicit neural network can be viewed as an implicit function, with the formula:
Figure BDA00036759863800000322
wherein, p is a query point, r and e are region embedding and instance embedding respectively, and are taken as query conditions, and τ represents output to obtain an occupancy value.
The neural network is supervised using cross entropy losses for predicted occupancy and real occupancy. Furthermore, we define a collision loss to penalize the collision of the hand surface. Considering that a query point belongs to at most one single-handed instance, the penetration loss is defined as:
Figure BDA0003675986380000041
where Ω represents all query points and k is equal to the number of hands in each region.
Further, in step 4, after the hand surface is implicitly reconstructed, the physical rationality of the reconstructed interactive hand is checked, whether the reconstructed interactive hand model has penetration or not is judged, and if the reconstructed interactive hand model has penetration, the reconstructed model is optimized by using a physical optimization method.
Further, in step 5, considering that the hand is not rigid, penetration within 2mm is allowed. And (5) iteratively and optimally reconstructing the model according to the step (4) until the depth d is less than 2 mm.
Further, the system comprises a feature extraction module, an implicit reconstruction module and a physical optimization module. The feature extraction module comprises two units: global and regional feature extraction unit for obtaining global mask G from single input image M Global Z-mapG Z Region image
Figure BDA0003675986380000042
Area mask
Figure BDA0003675986380000043
And area
Figure BDA0003675986380000044
An example and joint feature estimation unit for extracting a positioning map of the visible joint for each region i in parallel according to the region features
Figure BDA0003675986380000045
And
Figure BDA0003675986380000046
and visible example mask E (k) . The implicit reconstruction module includes two units: a query point and query condition acquisition unit for each region
Figure BDA0003675986380000047
Selecting query points according to a query strategy, and acquiring region embedding r from a feature extraction module (i) Masking E from the instance of the area by a multi-layer perceptron (k) Intermediate coding results in instance embedding { e } (k) H, will r (i) And e (k) As a query condition for implicit reconstruction; and the implicit curved surface reconstruction unit acquires the occupancy rate value of each query point through a parameterized implicit neural network and further extracts a reconstructed curved surface. And the physical optimization module is used for judging whether the reconstructed model has penetration, if the reconstructed model has penetration, the reconstructed model is optimized by using a physical optimization method, the reconstructed model is iteratively optimized until the maximum penetration depth is less than 2mm, and the optimization result is used as the final reconstruction result of the interactive hand.
Compared with the prior art, the invention has the following advantages:
the invention provides a three-dimensional interactive hand reconstruction method based on implicit expression, which can reconstruct hands with any chirality and quantity in an image only by a single color image, realizes an end-to-end modeling mode, and improves the quality of three-dimensional gesture and shape reconstruction.
In addition, according to the image clue, a new query strategy and query conditions are designed, so that the query efficiency is improved, and the query space is reduced.
In addition, aiming at the unreasonable reconstruction result in physics, an effective physical optimization scheme is provided to solve the problem of mutual permeation between examples under implicit expression, and the physical authenticity of the reconstruction result is effectively ensured.
Drawings
FIG. 1 is a flow chart of a three-dimensional interactive hand reconstruction method in an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a three-dimensional interactive hand reconstruction system in an embodiment of the present invention;
FIG. 3 is a diagram of a physical optimization scheme in an embodiment of the present invention, wherein (a) the diagram is a force calculation for each pixel, and (b) the diagram is a force calculation for each example.
Fig. 4 is a diagram of reconstruction effect that can be achieved by the present invention, wherein (a) the histogram is an input image, (b) the histogram is a real mask, (c) the histogram is a predicted mask, (d) the histogram is an implicit reconstruction result, and (e) the two histograms are results of two optimized views.
Detailed Description
The following detailed description of the embodiments of the present invention will be made with reference to the accompanying drawings and examples, so as to enable a user to clearly understand the present invention by combing the methods and effects of the present invention. It should be noted that, in the case of no conflict, the features of the embodiments of the present invention may be combined with each other, and the formed technical solutions are within the scope of the present invention.
Further, the flowcharts shown in the drawings may be executed in a series of sequential instructions in a computer, and the flow order may be appropriately modified in some cases.
Example one
Fig. 1 is a flowchart of a three-dimensional interactive hand reconstruction method based on implicit representation according to a first embodiment of the present invention, and each step is described in detail below with reference to fig. 1.
And step S110, building a characteristic extraction neural network, and acquiring corresponding characteristics from the input color image.
In order to obtain corresponding features from a single color image, a feature extraction network is built. As shown in fig. 2(S1), the feature extraction network consists of two encoder-decoder neural networks with ResNet18 as the backbone. For an input color image I containing interactive hands, a global mask G is first obtained using a first encoder-decoder network M And Global Z-mapG Z . Then, the bounding box coordinates for each region are used to crop I, G M And G Z To obtain region information, i.e. region images
Figure BDA0003675986380000061
Area mask
Figure BDA0003675986380000062
Region(s)
Figure BDA0003675986380000063
Then, according to
Figure BDA0003675986380000064
And
Figure BDA0003675986380000065
using a second encoder-decoder network, a localization map of visible joints is extracted in parallel for each region i
Figure BDA0003675986380000066
And
Figure BDA0003675986380000067
and visible example mask E (k) . Where k represents the number of visible example masks in region i.
To achieve the above functionality, the feature extraction network is pre-trained using the following loss function:
L feature =L Global +L instance +L joint
first term L of the loss function Global Indicating that the global mask and the global Z-map are supervised, the second term L of the loss function instance Shows the third term L of the loss function for supervising the example mask joint Showing the supervision of the joint positioning map and the Z-map. L is a radical of an alcohol Global 、L instance And L joint Are defined as follows:
Figure BDA0003675986380000068
Figure BDA0003675986380000069
Figure BDA00036759863800000610
L Global in a loss function
Figure BDA00036759863800000611
And
Figure BDA00036759863800000612
network predictors for the global mask and the global Z-map respectively,
Figure BDA00036759863800000613
and
Figure BDA00036759863800000614
is the corresponding true value; l is a radical of an alcohol instance In a loss function
Figure BDA00036759863800000615
And
Figure BDA00036759863800000616
respectively representing the network predicted value and the truth value of the kth example mask in the area i; l is joint In a loss function
Figure BDA00036759863800000617
And
Figure BDA00036759863800000618
respectively a positioning map of the joint of the area i and a network predicted value of the Z-map,
Figure BDA0003675986380000071
is and
Figure BDA0003675986380000072
is the corresponding true value.
And step S120, selecting query points according to the query strategy by using the acquired characteristics, and constructing query conditions.
The specific process of query point selection is as follows: for each region
Figure BDA0003675986380000073
First, find the corresponding query silhouette
Figure BDA0003675986380000074
And maximum query depth
Figure BDA0003675986380000075
Then, use
Figure BDA0003675986380000076
As the area of the bottom portion,
Figure BDA0003675986380000077
as height, a sampling space d is constructed (i) . Samples are sampled uniformly in this space. In addition, each visible joint is considered an anchor. Another number of sampling points are selected near each anchor. Specifically, a Gaussian mixture distribution (x) is used (j) σ), where each visible joint coordinate x (j) As a center, σ is the variance of each dimension. In the experiments of the present invention, σ was set to 2.5 cm. All the sampling points are combined to be used as query points, and the query points are normalized to be a unit cube. The specific process of constructing the query condition is as follows: for each region
Figure BDA0003675986380000078
Obtaining region embedding r from feature extraction of FIG. 2(S1) (i) . For each instance E in the region (k) Sending to a multi-layer perceptron to encode to obtain an instance embedding e (k) . Will r is (i) And e (k) As a query condition for implicit reconstruction.
And S130, building a parameterized implicit neural network, and performing implicit reconstruction based on the query points and the query conditions.
As shown in FIG. 2(S2), a parameterized neural network is established based on the query points and the query conditions
Figure BDA0003675986380000079
It is treated as an implicit function, and the formula is as follows:
Figure BDA00036759863800000710
wherein, p is a query point, r and e are region embedding and instance embedding respectively, and are taken as query conditions, and τ represents output to obtain an occupancy value.
In the experimental phase of the present invention, the surface of the hand is implicitly represented by an isosurface τ -0.5. If the occupancy rate of the query point is less than 0.5, the point is outside the surface, and if the occupancy rate is greater than 0.5, the point is inside the surface.
We use cross entropy loss of predicted occupancy and real occupancy to supervise the neural network. Furthermore, we define a collision loss to penalize the collision of the hand surface. Considering that a query point belongs to at most one-handed instance, the penetration loss is defined as:
Figure BDA0003675986380000081
where Ω represents all query points and k is equal to the number of hands in each region.
The parameterized implicit neural network consists of three convolutional layers and nine fully-connected layers. Training was performed using the SGD optimizer, 64 per batch. The query point is not fixed and is resampled in each iteration.
And step S140, carrying out physical rationality check on the reconstructed interaction hand, optimizing and updating the reconstructed model.
After the hand surface is obtained through implicit reconstruction, physical reasonability check is carried out on the reconstructed interactive hand, and unreasonable penetration is punished so as to better align real gestures. As shown in FIG. 3, the penetration depth is first calculated and the from-region mask is projected in the positive Z direction
Figure BDA00036759863800000811
The observed rays should continue through one hand and the number of intersections should be even. Assume that there is a ray that passes through the implicitly reconstructed regions of two interacting hands a and B. Along this ray, all intersections are recorded as
Figure BDA0003675986380000082
Where N represents the number of intersections. For the intersection of the hand A surfaces
Figure BDA0003675986380000083
If it is not
Figure BDA0003675986380000084
And
Figure BDA0003675986380000085
not belonging to the same hand surface, this indicates penetration at that location. In this case, the next intersection point is found on the surface of hand A
Figure BDA0003675986380000086
Defining the penetration depth as the intersection
Figure BDA0003675986380000087
And
Figure BDA0003675986380000088
the distance between them. Then, the force of each pixel is calculated. In the optimization stage, it is desirable that the hand pose be adjusted to any orientation, not parallel to the light. Considering that the force is directional, the magnitude of the force is directly related to the depth of penetration, and the position where the penetration occurs can act like a spring to generate a repulsive force, in which case the handle is considered as a rigid body. The relationship between penetration depth and corresponding repulsion force is defined as:
Figure BDA0003675986380000089
wherein f is (u,v) Denotes the force generated by the light emitted along the pixel (u, v) due to penetration, t denotes the number of penetrations, λ is the associated weight, d t Indicating the t-th penetration depth.
By adding the repulsive forces of all the light rays, the direction of the resultant force F may deviate from the z-axis direction due to the difference in the repulsive force of each light ray. The formula for the resultant force is as follows:
Figure BDA00036759863800000810
from the resultant force F, the average penetration depth is calculated
Figure BDA0003675986380000091
Then, for each hand, move in a direction parallel to F
Figure BDA0003675986380000092
To penalize extreme adjustments, projection losses are used, in which orthogonal projections are used.
Figure BDA0003675986380000093
Wherein H pose Represents an optimized interactive hand, pi (H) pose ) Is a 2D projection thereof, R M Representing the region mask estimated by the encoder-decoder.
And S150, iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as a final reconstruction result.
Allowing penetration within 2mm considering the hand to be non-rigid. And (5) iteratively optimizing the reconstruction model according to the step (S140) until the depth d is less than 2 mm.
In the first embodiment, the input color image may include any chirality and number of hands, and the experimental result is shown in fig. 4. The first column of fig. 4 represents the input color image, the second and third columns represent the real mask and the predicted mask, respectively, the fourth column represents the implicit reconstruction result, and the fifth column represents the optimized reconstruction result.
Those skilled in the art will appreciate that the modules or steps of the invention described above can be implemented in a general purpose computing device, centralized on a single computing device or distributed across a network of computing devices, and optionally implemented in program code that is executable by a computing device, such that the modules or steps are stored in a memory device and executed by a computing device, fabricated separately into integrated circuit modules, or fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Claims (9)

1. A three-dimensional interactive hand reconstruction method based on implicit expression is characterized by comprising the following steps: the method comprises the following steps:
step 1, building a feature extraction neural network, acquiring global features from a single input color image, dividing the global features into a plurality of regional features based on a connected part, and further acquiring instance features and joint features according to the input image and the regional features;
step 2, selecting query points according to a query strategy by using the regional characteristics, the example characteristics and the joint characteristics acquired in the step 1, and constructing query conditions;
step 3, building a parameterized implicit neural network, and performing implicit reconstruction based on the query points and the query conditions;
step 4, carrying out physical optimization on the reconstructed model, punishing unreasonable penetration, and adjusting and updating the reconstructed model;
and 5, iteratively optimizing the reconstructed model until the maximum penetration depth is less than 2mm, and taking the optimized result as the final reconstruction result of the interactive hand.
2. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 1, wherein: step 1 adopts two encoder-decoder neural networks using ResNet18 as a backbone, and supervises the acquired features in the network training process.
3. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 2, wherein: the training constrains the features by the following loss function:
L feature =L Global +L instance +L joint
first term L of the loss function Global Indicating that the global mask and global Z-map are constrained,second term L of loss function instance Showing the third term L of the loss function, constraining the example mask joint Showing the supervision of the positioning chart and the Z-map of the joint; l is Global 、L instance And L joint Are defined as follows:
Figure FDA0003675986370000011
Figure FDA0003675986370000021
Figure FDA0003675986370000022
L Global in a loss function
Figure FDA0003675986370000023
And
Figure FDA0003675986370000024
network predictors for the global mask and the global Z-map respectively,
Figure FDA0003675986370000025
and
Figure FDA0003675986370000026
is the corresponding true value; l is instance In a loss function
Figure FDA0003675986370000027
And
Figure FDA0003675986370000028
respectively representing the network predicted value and the truth value of the kth example mask in the area i; l is joint In a loss function
Figure FDA0003675986370000029
And
Figure FDA00036759863700000210
respectively a localization map of the joint of the zone i and a network prediction value of the Z-map,
Figure FDA00036759863700000211
is and
Figure FDA00036759863700000212
is the corresponding true value.
4. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 1, wherein: in step 1, for an input color image I containing an interactive hand, a first encoder-decoder network is used to obtain a global mask G M And Global Z-mapG Z (ii) a Then, the bounding box coordinates for each region are used to crop I, G M And G Z To obtain region information, i.e. region images
Figure FDA00036759863700000213
Area mask
Figure FDA00036759863700000214
Region(s)
Figure FDA00036759863700000215
Then, according to
Figure FDA00036759863700000216
And
Figure FDA00036759863700000217
parallel extraction of the scout map of the visible joints for each region i using a second encoder-decoder network
Figure FDA00036759863700000218
And
Figure FDA00036759863700000219
and mask E of the visible example (k) (ii) a k represents the number of example masks visible in region i.
5. The implicit representation based three-dimensional interactive hand reconstruction method of claim 1, characterized in that: in the step 2, for each region
Figure FDA00036759863700000220
Construction of a sampling space d according to a query strategy (i) Sampling points in the sampling space; combining all sampling points to serve as query points, and normalizing the query points into a unit cube; in addition, for each region, region embedding r is obtained from the step 1 feature extraction (i) Each instance E in the region (k) Sending to a multi-layer perceptron to encode to obtain an instance embedding e (k) (ii) a Will r is (i) And e (k) As a query condition for implicit reconstruction.
6. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 5, wherein: in the step 3, based on the query point and the query condition, performing implicit reconstruction by using a parameterized implicit neural network; the parameterized implicit neural network is treated as an implicit function, and the formula is as follows:
Figure FDA0003675986370000031
wherein, p is a query point, r and e are region embedding and instance embedding respectively and are taken as query conditions, and tau represents an occupancy rate value obtained by output;
supervising the neural network using cross entropy losses for the predicted occupancy and the true occupancy; in addition, a collision loss is defined to penalize the collision of the hand surface; considering that a query point belongs to at most one single-handed instance, the penetration loss is defined as:
Figure FDA0003675986370000032
where Ω represents all query points and k is equal to the number of hands in each region.
7. The implicit representation based three-dimensional interactive hand reconstruction method of claim 1, characterized in that: and 4, after the hand surface is obtained through implicit reconstruction, carrying out physical rationality check on the reconstructed interactive hand, judging whether the reconstructed interactive hand model has penetration, and if the reconstructed interactive hand model has penetration, optimizing the reconstructed model by using a physical optimization method.
8. The implicit representation based three-dimensional interactive hand reconstruction method according to claim 1, wherein: in said step 5, considering that the hand is not rigid, allowing penetration within 2 mm; and (5) iteratively and optimally reconstructing the model according to the step (4) until the depth d is less than 2 mm.
9. A three-dimensional interactive hand reconstruction system based on implicit expression is characterized in that: the system comprises a feature extraction module, an implicit reconstruction module and a physical optimization module;
the feature extraction module comprises two units: global and regional feature extraction unit for obtaining global mask G from single input image M Global Z-mapG Z Region image
Figure FDA0003675986370000033
Area mask
Figure FDA0003675986370000034
And zone Z-
Figure FDA0003675986370000035
An example and joint feature estimation unit for extracting a positioning map of the visible joint for each region i in parallel according to the region features
Figure FDA0003675986370000036
And
Figure FDA0003675986370000037
and visible example mask E (k)
The implicit reconstruction module comprises two units: a query point and query condition acquisition unit for each region
Figure FDA0003675986370000038
Selecting query points according to a query strategy, and acquiring region embedding r from a feature extraction module (i) Masking E from the instance of the area by a multi-layer perceptron (k) Intermediate coding results in instance embedding { e } (k) H, will r (i) And e (k) As a query condition for implicit reconstruction; the implicit curved surface reconstruction unit is used for acquiring the occupancy rate value of each query point through a parameterized implicit neural network and further extracting a reconstructed curved surface;
and the physical optimization module judges whether the reconstructed model has penetration, if the reconstructed model has penetration, the reconstructed model is optimized by using a physical optimization method, the reconstructed model is iteratively optimized until the maximum penetration depth is less than 2mm, and the optimization result is used as the final reconstruction result of the interactive hand.
CN202210619894.7A 2022-06-02 2022-06-02 Three-dimensional interactive hand reconstruction method and system based on implicit expression Pending CN114998520A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210619894.7A CN114998520A (en) 2022-06-02 2022-06-02 Three-dimensional interactive hand reconstruction method and system based on implicit expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210619894.7A CN114998520A (en) 2022-06-02 2022-06-02 Three-dimensional interactive hand reconstruction method and system based on implicit expression

Publications (1)

Publication Number Publication Date
CN114998520A true CN114998520A (en) 2022-09-02

Family

ID=83030222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210619894.7A Pending CN114998520A (en) 2022-06-02 2022-06-02 Three-dimensional interactive hand reconstruction method and system based on implicit expression

Country Status (1)

Country Link
CN (1) CN114998520A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740290A (en) * 2023-08-15 2023-09-12 江西农业大学 Three-dimensional interaction double-hand reconstruction method and system based on deformable attention

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740290A (en) * 2023-08-15 2023-09-12 江西农业大学 Three-dimensional interaction double-hand reconstruction method and system based on deformable attention
CN116740290B (en) * 2023-08-15 2023-11-07 江西农业大学 Three-dimensional interaction double-hand reconstruction method and system based on deformable attention

Similar Documents

Publication Publication Date Title
CN106327571B (en) A kind of three-dimensional face modeling method and device
CN109410307A (en) A kind of scene point cloud semantic segmentation method
CN110419049A (en) Room layout estimation method and technology
CN108921926A (en) A kind of end-to-end three-dimensional facial reconstruction method based on single image
CN111783582A (en) Unsupervised monocular depth estimation algorithm based on deep learning
CN110135249A (en) Human bodys' response method based on time attention mechanism and LSTM
CN112308918B (en) Non-supervision monocular vision odometer method based on pose decoupling estimation
CN113393550B (en) Fashion garment design synthesis method guided by postures and textures
CN110335344A (en) Three-dimensional rebuilding method based on 2D-3D attention mechanism neural network model
CN108846343B (en) Multi-task collaborative analysis method based on three-dimensional video
CN109726619A (en) A kind of convolutional neural networks face identification method and system based on parameter sharing
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN109657634A (en) A kind of 3D gesture identification method and system based on depth convolutional neural networks
CN112819951A (en) Three-dimensional human body reconstruction method with shielding function based on depth map restoration
CN113158861A (en) Motion analysis method based on prototype comparison learning
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
CN114998520A (en) Three-dimensional interactive hand reconstruction method and system based on implicit expression
Kang et al. 3D human pose lifting with grid convolution
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
CN108961384A (en) three-dimensional image reconstruction method
CN114202606A (en) Image processing method, electronic device, storage medium, and computer program product
Bao et al. Pose ResNet: a 3D human pose estimation network model
CN111311648A (en) Method for tracking human hand-object interaction process based on collaborative differential evolution filtering
Wang et al. Scene recognition based on DNN and game theory with its applications in human-robot interaction
CN115170817B (en) Character interaction detection method based on three-dimensional human-object grid topology enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination