CN115546888A - Symmetric semantic graph convolution attitude estimation method based on body part grouping - Google Patents
Symmetric semantic graph convolution attitude estimation method based on body part grouping Download PDFInfo
- Publication number
- CN115546888A CN115546888A CN202211084071.5A CN202211084071A CN115546888A CN 115546888 A CN115546888 A CN 115546888A CN 202211084071 A CN202211084071 A CN 202211084071A CN 115546888 A CN115546888 A CN 115546888A
- Authority
- CN
- China
- Prior art keywords
- symmetrical
- semantic graph
- local
- graph convolution
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Medical Informatics (AREA)
- Pure & Applied Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Algebra (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a symmetric semantic graph convolution attitude estimation method based on body part grouping, which comprises the following steps: s1, inputting two-dimensional human body joint points and connection relations thereof, and constructing a symmetrical semantic graph volume layer and a non-local layer of a joint point graph structure; s2, grouping body parts according to the body trunk to respectively obtain local and non-local characteristics of each trunk and local and non-local characteristics of the whole body, and performing fusion calculation on the obtained characteristics; s3, constructing a symmetrical semantic graph volume posture estimation network model of the body part grouping based on the symmetrical semantic graph volume layer, the non-local layer and the body part grouping; and S4, training the symmetrical semantic graph convolution posture estimation network model by using a Human3.6M data set, inputting the two-dimensional human body joint point to be estimated into the trained symmetrical semantic graph convolution posture estimation network model, and outputting the estimated three-dimensional human body joint point. The method can be applied to the fields of movie animation, virtual reality, motion action analysis and the like, and has better effect and improved generalization capability.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a symmetric semantic graph convolution attitude estimation method based on body part grouping.
Background
Human pose estimation has been widely used in many computer vision tasks, such as virtual reality, human-computer interaction, and behavior recognition. Due to the rapid development of deep learning, the performance of estimating the three-dimensional human body posture from the image is obviously improved, and the method becomes a current research hotspot.
The existing 3D pose estimation methods have two types, one is to directly predict a 3D pose from an image, and the other is to predict a 2D pose first and then regress the 3D pose. The first method can obtain a large amount of information from the image, but the model is greatly influenced by factors such as image background and human body dressing, and the learning characteristics required by the model have complexity. The second method reduces the overall work complexity, the network model can learn the mapping from 2D to 3D space more easily, and the model is more mainstream due to the maturity of 2D attitude estimation research.
A three-dimensional human body posture estimation method based on a graph convolution network (CN 112712019A) has the advantages of improving the regression performance of the three-dimensional human body posture and reducing the use of network parameters, but the generalization capability of a model needs to be improved. In the existing research, the human body posture estimation algorithm under the deep learning background is easily influenced by self-shielding, environmental shielding and the like, the human body posture has diversity, and the generalization capability of the current model is poor. Therefore, a more reasonable and universal network model is urgently needed to be explored at present to improve the attitude estimation effect.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a method for estimating the convolution posture of a symmetric semantic graph based on body part grouping.
The purpose of the invention can be achieved by adopting the following technical scheme:
a symmetric semantic graph convolution pose estimation method based on body part grouping, as shown in fig. 1, the symmetric semantic graph convolution pose estimation method includes the following steps:
s1, inputting two-dimensional human body joint points and connection relations thereof in movie animation, virtual reality or motion actions, and constructing a symmetrical semantic graph convolution layer and a non-local layer of a joint graph structure;
s2, grouping body parts according to the body trunk to respectively obtain local and non-local characteristics of each trunk and local and non-local characteristics of the whole body, and performing fusion calculation on the obtained characteristics;
s3, constructing a symmetrical semantic graph volume posture estimation network model of the body part grouping based on the symmetrical semantic graph volume layer, the non-local layer and the body part grouping;
s4, use of H u m a And the n3.6M data set trains the symmetrical semantic graph convolution posture estimation network model, inputs the two-dimensional human body joint point to be estimated into the trained symmetrical semantic graph convolution posture estimation network model, and outputs the estimated three-dimensional human body joint point.
Further, in the step S1, the process of constructing the symmetrical semantic graph convolution layer and the non-local layer of the node graph structure using the two-dimensional human body joint points and the connection relationship thereof is as follows:
let X (l) And X (l+1) Respectively representing the characteristics of the nodes in the graph structure before and after the I-th layer of convolution, wherein the form of the convolution of the symmetric graph is as follows:
X (l+1) =σ(WX (l) A sym ) (1)
where σ () represents an activation function, W represents a learnable weight parameter, A sym The matrix obtained by symmetrically normalizing the adjacency matrix a of the figure is expressed as follows:
a is an adjacent matrix of the graph, D is a degree matrix, and the information of neighbor nodes can be better aggregated through symmetrical normalization so as to obtain balanced node characteristics;
a learnable weighting matrix M is added on the basis of the convolution of the symmetrical graph to construct and obtain the convolution layer of the symmetrical semantic graph, and the calculation formula of the convolution layer of the symmetrical semantic graph is expressed as follows:
X (l+1) =σ(WX (l) ρ i (M⊙A sym )) (3)
where ρ is i () The function is a Softmax nonlinear function and is used for normalizing the matrix of the node i, wherein the result indicates multiplication operation of elements corresponding to the matrix;
in order to capture global features between nodes in a graph, a concept of a non-local layer is introduced, and the operation of the non-local layer is defined as:
wherein, W x Expressing a normalization factor of a learnable weight parameter W, K expressing the number of nodes, i expressing the index of a target node to be calculated, and j expressing the index of nodes except i;respectively representing the input characteristics of the nodes i and j;representing the output characteristics of the node i; f (,) is a learnable bivariate function used for calculating the similarity of two input features; g () is a learnable univariate function that transforms the input features.
Furthermore, in the step S2, the body parts are grouped, and the human body joint points are divided into a left limb group, a right limb group and a whole body group, each joint point in the group has stronger relevance, and each group performs feature extraction through an independent sub-network to enhance the local relationship.
As shown in fig. 4, the feature fusion is performed by learning features in each group and then fusing the features in each group, where the feature fusion is defined as:
f fuse =Concat(f left ,f right ,f all ) (5)
wherein Concat (,) represents the joining of features, f left Characteristic of the left limb group, f right Characteristic of the right limb group, f all Characteristic of the whole body group, f fuse Is the feature obtained after fusion.
The realization of body part grouping learns the consistency of local joints under the condition of ensuring the consistency of global postures, and can better generalize symmetrical postures and rare and sheltered postures in training data.
Further, in the step S3, a plurality of symmetrical semantic graph convolution modules are constructed based on the symmetrical semantic graph convolution layers and the non-local layers, all the symmetrical semantic graph convolution modules have the same structure, each symmetrical semantic graph convolution module is formed by sequentially connecting two symmetrical semantic graph convolution layers and one non-local layer in sequence, and the local and global semantic relationships between the nodes are acquired by the symmetrical semantic graph convolution layers and the non-local layers alternately;
in the symmetric semantic graph convolution network, as shown in fig. 3, an input is mapped to a potential space by using a symmetric semantic graph convolution layer and a non-local layer; then obtaining the coded characteristics through four symmetrical semantic graph convolution modules which are sequentially connected in sequence, and carrying out batch standardization and R after all symmetrical semantic graph convolution layers in the symmetrical semantic graph convolution network e LU nonlinear activation;
the symmetrical semantic graph convolution posture estimation network model of the body part grouping comprises a first branch, a second branch and a third branch, as shown in fig. 2, wherein the first branch, the second branch and the third branch all use a symmetrical semantic graph convolution network to perform feature extraction: inputting the left limb group into the first branch, and extracting the feature f of the left limb through a symmetrical semantic graph convolution network left (ii) a Inputting a second branch into the right limb group, and extracting the characteristic f of the right limb through a symmetrical semantic graph convolution network right (ii) a Inputting a third branch into the whole body group, and extracting the characteristics f of the whole body through a symmetrical semantic graph convolution network all (ii) a Calculating to obtain a fused feature f according to a formula (5) fuse The encoded features are then projected into the output space using a symmetric semantic map convolutional layer.
Further, the loss function L defined by formula (6) is adopted in the step S4 smoothl1 () Training was performed on the human3.6M dataset, and the formula is as follows:
wherein X represents the difference between the true value and the predicted value, | - | represents the absolute value of the difference between the true value and the predicted value, J' i 3D Joint coordinates representing predicted i-node, J i Corresponding to the truth value of the inode in the dataset. L is smoothl1 (J) The loss function is insensitive to outlier nodes and abnormal values, and the magnitude of the gradient can be controlled, so that the training is reasonably converged.
Further, the evaluation index generally used for attitude estimation is MPJPE (Mean Per Joint Position Error), and the formula is defined as (7):
E MPJPE () The index represents the mean of the L2 distance between each joint predicted value and the truth value, | | · | | luminance 2 Indicating the L2 distance from the predictor to the true value. When the evaluation index MPJPE is small, the 3D human body posture estimation result is considered to be excellent.
Further, during training, the initial learning rate was 0.001, using a batch size of 64. The initial learning rate directly affects the convergence state of the model, the batch processing size affects the generalization capability of the model, the initial learning rate of 0.001 is adopted to facilitate the convergence of the model, and the batch processing size of 64 is adopted to facilitate the generalization of the model.
Compared with the prior art, the invention has the following advantages and effects:
according to the body part grouping-based symmetrical semantic graph convolution posture estimation network, symmetrical semantic graph convolution is introduced, so that information of neighbor nodes can be aggregated better, and balanced node characteristics can be obtained; body part groupings are designed, the body is divided into left/right trunks by part, and these body part groups are learned through independent sub-networks to enhance local features. At H u m a Compared with other methods, the n3.6M data set has better effect and improved generalization capability.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a three-dimensional human body posture migration method based on posture estimation driving disclosed by the invention;
FIG. 2 is a diagram of a symmetric semantic graph convolutional network model based on body part grouping in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a symmetrical semantic graph rolling module according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a body part grouping feature fusion module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A method for estimating a convolutional pose of a symmetric semantic map based on body part grouping, as shown in fig. 1, the method comprising the steps of:
s1, inputting two-dimensional human body joint points and connection relations thereof in movie animation, virtual reality or movement action, and constructing a symmetrical semantic graph convolution layer and a non-local layer of a joint graph structure;
in the step S1, the process of constructing the symmetrical semantic graph volume layer and the non-local layer of the node graph structure by using the two-dimensional human body joint points and the connection relations thereof is as follows:
let X (l) And X (l+1) Respectively representing the characteristics of the nodes in the graph structure before and after the I-th layer of convolution, wherein the form of the convolution of the symmetric graph is as follows:
X (l+1) =σ(WX (l) A sym ) (1)
where σ () represents an activation function, W represents a learnable weight parameter, A sym The matrix obtained by symmetrically normalizing the adjacency matrix a of the graph is expressed as follows:
where A is the adjacency matrix of the graph and D is the degree matrix;
a learnable weighting matrix M is added on the basis of the convolution of the symmetrical graph to construct and obtain the convolution layer of the symmetrical semantic graph, and the calculation formula of the convolution layer of the symmetrical semantic graph is expressed as follows:
X (l+1) =σ(WX (l) ρ i (M⊙A sym) ) (3)
where ρ is i () The function is a Softmax nonlinear function and is used for normalizing the matrix of the node i, wherein the result indicates multiplication operation of elements corresponding to the matrix;
in order to capture global features between nodes in a graph, a concept of a non-local layer is introduced, and the operation of the non-local layer is defined as:
wherein, W x Expressing a normalization factor of a learnable weight parameter W, K expressing the number of nodes, i expressing the index of a target node to be calculated, and j expressing the index of nodes except i;respectively representing the input characteristics of the nodes i and j;representing the output characteristics of the node i; f (,) is a learnable bivariate function used for calculating the similarity of two input features; g () is a learnable univariate function that transforms the input features.
S2, grouping body parts according to body trunk data in movie animation, virtual reality or motion, respectively obtaining local and non-local characteristics of each trunk and local and non-local characteristics of the whole body, and performing fusion calculation on the obtained characteristics;
in the step S2, the body parts are grouped, human joint points in movie animation, virtual reality or motion actions are decomposed into a left limb group, a right limb group and a whole body group, all joint points in the groups have stronger relevance, and each group is subjected to feature extraction through an independent sub-network so as to enhance local relations.
As shown in fig. 4, the feature fusion is performed by learning features in each group and then performing fusion calculation on the features in each group, where the feature fusion is defined as:
f fuse =Concat(f left ,f right ,f all ) (5)
where Concat (,) denotes that the feature is subjected to a join operation, f left Characteristic of the left limb group, f right Characteristic of the right limb group, f all Characteristic of the whole body group, f fuse Is the feature obtained after fusion.
S3, constructing a symmetrical semantic graph volume posture estimation network model of the body part grouping based on the symmetrical semantic graph volume layer, the non-local layer and the body part grouping;
in the step S3, a plurality of symmetrical semantic graph convolution modules are constructed based on symmetrical semantic graph convolution layers and non-local layers, all the symmetrical semantic graph convolution modules have the same structure, and each symmetrical semantic graph convolution module is formed by sequentially connecting two symmetrical semantic graph convolution layers and one non-local layer;
in the symmetric semantic graph convolution network, as shown in fig. 3, first a symmetric semantic graph convolution layer and a non-local layer are used to map the input to the potential space; then, obtaining coding characteristics through four symmetrical semantic graph convolution modules which are sequentially connected in sequence, and carrying out batch standardization and ReLU nonlinear activation after all symmetrical semantic graph convolution layers in a symmetrical semantic graph convolution network are stacked;
the symmetrical semantic graph convolution posture estimation network model of the body part grouping comprises a first branch, a second branch and a third branch, as shown in fig. 2, wherein the first branch, the second branch and the third branch all use a symmetrical semantic graph convolution network to perform feature extraction: inputting the left limb group into the first branch, and extracting the feature f of the left limb through a symmetrical semantic graph convolution network left (ii) a Inputting a second branch into the right limb group, and extracting the characteristic f of the right limb through a symmetrical semantic graph convolution network right (ii) a Inputting the third branch into the whole body group, and extracting the characteristics f of the whole body through a symmetrical semantic graph convolution network all (ii) a Calculating to obtain a fused characteristic f according to a formula (5) fuse The encoded features are then projected into the output space using a symmetric semantic map convolutional layer.
And S4, training the symmetrical semantic graph convolution posture estimation network model by using a Human3.6M data set, inputting two-dimensional human body joint points in the motion, virtual reality or motion to be estimated into the trained symmetrical semantic graph convolution posture estimation network model, and outputting the estimated three-dimensional human body joint points in the motion, virtual reality or motion.
The loss function L defined by the formula (6) in step S4 smoothl1 () Training was performed on the human3.6M dataset, and the formula is as follows:
wherein X represents the difference between the true value and the predicted value of the cinematic motion, virtual reality, or motion data, | - | represents the absolute value of the difference between the true value and the predicted value, J' i 3D Joint coordinates representing predicted i-node, J i Corresponding to the truth value of the inode in the dataset.
The evaluation index usually adopted for attitude estimation is MPJPE (Mean Per Joint Position Error), and the formula is defined as (7):
E MPJPE () The indicator represents the mean of the L2 distance between the predicted value of each joint and the true value in the movie animation, virtual reality or motion action, | · | | sweet wind 2 Indicating the L2 distance from the predictor to the true value. When the evaluation index MPJPE is small, the 3D human body posture estimation result is considered to be excellent.
During training, the initial learning rate was 0.001 and the batch size was 64.
Example 2
In this embodiment, based on the method for estimating the convolutional pose of the symmetric semantic graph based on the body part grouping disclosed in embodiment 1, in order to verify the effectiveness of the present invention, an experiment is performed on a human3.6m data set, and the technical effect of the present invention is explained by combining the experimental results.
Human3.6m is one of the most used data sets for 3D pose estimation, covers 360 million images, is collected in a controlled indoor environment, has 11 experimenters in total, and collects the body pose of the experimenters in daily activity scenes by using a marked motion capture device, and comprises 15 actions.
Experimental configuration: hardware environment: GPU RTX 2080Ti video memory: 11GB, CPU 4 core Intel (R) Xeon (R) Silver 4110 CPU@2.10GHz internal memory 16GB. Software environment: python v2.7, pytrch v1.1.0, CUDA 10.2. Operating the system: ubuntu18.04.
The method proposed by the present invention was studied for ablation. With the above configuration, the posture estimation network proposed by the present invention includes two main modules: a symmetric semantic graph convolution module and a body part grouping. To verify their effectiveness, ablation experiments were set up as follows: the first experiment used only semantic graph convolution, the second used a symmetric semantic graph convolution module, the second used body part grouping, and the third used a symmetric semantic graph convolution module and body part grouping. The results of the experiment are shown in table 1:
TABLE 1 ablation experiment result table of symmetric semantic graph convolution attitude estimation method based on body part grouping
Symmetrical semantic graph convolution module | Body part grouping | MPJPE |
41.47mm | ||
√ | 40.68mm | |
√ | 40.53mm | |
√ | √ | 39.93mm |
Table 2 shows the comparative experimental results of the method of the present invention with the baseline method and the semantic graph convolution method, classified according to human body actions, under the MPJPE evaluation index, and the optimal method of each action is highlighted in bold respectively.
Table 2. Comparison experiment result table of symmetrical semantic graph convolution attitude estimation method based on body part grouping
Therefore, the body part grouping-based symmetrical semantic graph convolution posture estimation network provided by the invention realizes better performance, which shows that the model can effectively utilize the relation between different joint groups in the graph.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (8)
1. A method for estimating convolution posture of a symmetrical semantic graph based on body part grouping is characterized by comprising the following steps:
s1, inputting two-dimensional human body joint points and connection relations thereof in movie animation, virtual reality or motion actions, and constructing a symmetrical semantic graph convolution layer and a non-local layer of a joint graph structure;
s2, grouping body parts according to the body trunk to respectively obtain local and non-local characteristics of each trunk and local and non-local characteristics of the whole body, and performing fusion calculation on the obtained characteristics;
s3, constructing a symmetrical semantic graph volume posture estimation network model of the body part grouping based on the symmetrical semantic graph volume layer, the non-local layer and the body part grouping;
and S4, training the symmetrical semantic graph convolution posture estimation network model by using a Human3.6M data set, inputting the two-dimensional human body joint point to be estimated into the trained symmetrical semantic graph convolution posture estimation network model, and outputting the estimated three-dimensional human body joint point.
2. The method of claim 1, wherein the step S1 of constructing the symmetrical semantic map convolutional layer of the node map structure by using two-dimensional human body joint points and their connection relationships comprises the following steps:
let X (l) And X (l+1) Respectively representing the characteristics of the nodes in the graph structure before and after the I-th layer of convolution, wherein the form of the convolution of the symmetric graph is as follows:
X (l+1) =σ(WX (l) A sym ) (1)
where σ () represents an activation function, W represents a learnable weight parameter, A sym The matrix obtained by symmetrically normalizing the adjacency matrix a of the figure is expressed as follows:
where A is the adjacency matrix of the graph and D is the degree matrix;
a learnable weighting matrix M is added on the basis of the convolution of the symmetrical graph, and the symmetrical semantic graph convolution layer is constructed and obtained, wherein the calculation formula of the symmetrical semantic graph convolution layer is expressed as follows:
X (l+1) =σ(WX (l) ρ i (M⊙A sym )) (3)
where ρ is i () The function is a Softmax nonlinear function, and is used for normalizing the matrix of the node i, wherein the result indicates multiplication operation of elements corresponding to the matrix.
3. The method according to claim 2, wherein the two-dimensional human body joint points and their connection relationships are used in step S1 to construct the non-local layer of the joint graph structure as follows:
the operation of the non-local layer is defined as:
wherein, W x Expressing a normalization factor of a learnable weight parameter W, K expressing the number of nodes, i expressing the index of a target node to be calculated, and j expressing the index of nodes except i;respectively representing the input characteristics of the nodes i and j;representing the output characteristics of the node i; f (,) is a learnable bivariate function used for calculating the similarity of two input features; g () is a learnable unitary function for transforming input features.
4. The method according to claim 3, wherein in step S2, the human joint points are divided into left, right and whole body groups, the local relationship of each group is enhanced by an independent sub-network, and then features in each group are learned and fused by a late fusion feature fusion method, wherein the feature fusion is defined as:
f fuse =Concat(f left ,f right ,f all ) (5)
where Concat (,) denotes that the feature is subjected to a join operation, f left Characteristic of the left limb group, f right Characteristic of the right limb group, f all Characteristic of the whole body group, f fuse Are features obtained after fusion.
5. The method for estimating the convolutional attitudes of the symmetric semantic graphs based on the body part grouping according to claim 4, wherein in the step S3, a plurality of symmetric semantic graph convolution modules are constructed based on the symmetric semantic graph convolutional layers and the non-local layers, all the symmetric semantic graph convolution modules have the same structure, and each symmetric semantic graph convolution module is formed by sequentially connecting two symmetric semantic graph convolutional layers and one non-local layer;
in the symmetrical semantic graph convolution network, firstly, a symmetrical semantic graph convolution layer and a non-local layer are used, and input is mapped to a potential space; then, obtaining coding characteristics through four symmetrical semantic graph convolution modules which are sequentially connected in sequence, and carrying out batch standardization and ReLU nonlinear activation after all symmetrical semantic graph convolution layers in a symmetrical semantic graph convolution network are stacked;
the symmetrical semantic graph convolution posture estimation network model of the body part group comprises a first branch, a second branch and a third branch, wherein the first branch, the second branch and the third branch all use a symmetrical semantic graph convolution network to extract features: inputting a first branch into the left limb group, and extracting the characteristic f of the left limb through a symmetrical semantic graph convolution network left (ii) a Inputting the right limb group into a second branch, and extracting the feature f of the right limb through a symmetrical semantic graph convolution network right (ii) a Inputting the third branch into the whole body group, and extracting the characteristics f of the whole body through a symmetrical semantic graph convolution network all (ii) a Calculating to obtain a fused feature f according to a formula (5) fuse The encoded features are then projected into the output space using a symmetric semantic map convolutional layer.
6. According to claimThe method for estimating the convolution pose of the symmetric semantic graph based on the body part grouping of claim 5, wherein the step S4 adopts a loss function L defined by formula (6) smoothl1 () Training was performed on a human3.6M dataset, with the formula:
wherein X represents the difference between the true value and the predicted value, | - | represents the absolute value of the difference between the true value and the predicted value, J' i 3D Joint coordinates representing the predicted i-node, J i Corresponding to the truth value of the inode in the dataset.
7. The method according to claim 6, wherein the evaluation index used for the pose estimation is MPJPE, and the formula is defined as follows:
E MPJPE () The indicator represents the mean of the L2 distance of each joint predictor from the true value, | | 2 Indicating the L2 distance from the predictor to the true value.
8. The method of claim 6, wherein an initial learning rate is 0.001 and 64 batch processing is used in the training process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211084071.5A CN115546888A (en) | 2022-09-06 | 2022-09-06 | Symmetric semantic graph convolution attitude estimation method based on body part grouping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211084071.5A CN115546888A (en) | 2022-09-06 | 2022-09-06 | Symmetric semantic graph convolution attitude estimation method based on body part grouping |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115546888A true CN115546888A (en) | 2022-12-30 |
Family
ID=84726312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211084071.5A Pending CN115546888A (en) | 2022-09-06 | 2022-09-06 | Symmetric semantic graph convolution attitude estimation method based on body part grouping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115546888A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116486489A (en) * | 2023-06-26 | 2023-07-25 | 江西农业大学 | Three-dimensional hand object posture estimation method and system based on semantic perception graph convolution |
CN117611675A (en) * | 2024-01-22 | 2024-02-27 | 南京信息工程大学 | Three-dimensional human body posture estimation method, device, storage medium and equipment |
CN118247851A (en) * | 2024-05-28 | 2024-06-25 | 江西农业大学 | End-to-end hand object interaction attitude estimation method and system |
CN118397710A (en) * | 2024-06-25 | 2024-07-26 | 广东海洋大学 | Skeleton action recognition method based on semantic decomposition multi-relation graph convolutional network |
-
2022
- 2022-09-06 CN CN202211084071.5A patent/CN115546888A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116486489A (en) * | 2023-06-26 | 2023-07-25 | 江西农业大学 | Three-dimensional hand object posture estimation method and system based on semantic perception graph convolution |
CN116486489B (en) * | 2023-06-26 | 2023-08-29 | 江西农业大学 | Three-dimensional hand object posture estimation method and system based on semantic perception graph convolution |
CN117611675A (en) * | 2024-01-22 | 2024-02-27 | 南京信息工程大学 | Three-dimensional human body posture estimation method, device, storage medium and equipment |
CN117611675B (en) * | 2024-01-22 | 2024-04-16 | 南京信息工程大学 | Three-dimensional human body posture estimation method, device, storage medium and equipment |
CN118247851A (en) * | 2024-05-28 | 2024-06-25 | 江西农业大学 | End-to-end hand object interaction attitude estimation method and system |
CN118397710A (en) * | 2024-06-25 | 2024-07-26 | 广东海洋大学 | Skeleton action recognition method based on semantic decomposition multi-relation graph convolutional network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111310707B (en) | Bone-based graph annotation meaning network action recognition method and system | |
CN115546888A (en) | Symmetric semantic graph convolution attitude estimation method based on body part grouping | |
Dai et al. | MS2DG-Net: Progressive correspondence learning via multiple sparse semantics dynamic graph | |
CN111814719A (en) | Skeleton behavior identification method based on 3D space-time diagram convolution | |
CN113111760B (en) | Light-weight graph convolution human skeleton action recognition method based on channel attention | |
CN109598732B (en) | Medical image segmentation method based on three-dimensional space weighting | |
CN112232106B (en) | Two-dimensional to three-dimensional human body posture estimation method | |
CN114998525A (en) | Action identification method based on dynamic local-global graph convolutional neural network | |
CN110993037A (en) | Protein activity prediction device based on multi-view classification model | |
CN116030537B (en) | Three-dimensional human body posture estimation method based on multi-branch attention-seeking convolution | |
CN116343323A (en) | Action recognition method based on contrast learning, recognition model and device | |
Yang et al. | Channel expansion convolutional network for image classification | |
CN114898467A (en) | Human motion action recognition method, system and storage medium | |
Chen et al. | Costformer: Cost transformer for cost aggregation in multi-view stereo | |
CN114240999A (en) | Motion prediction method based on enhanced graph attention and time convolution network | |
CN117935362A (en) | Human behavior recognition method and system based on heterogeneous skeleton diagram | |
CN113989283A (en) | 3D human body posture estimation method and device, electronic equipment and storage medium | |
CN117475228A (en) | Three-dimensional point cloud classification and segmentation method based on double-domain feature learning | |
CN117115855A (en) | Human body posture estimation method and system based on multi-scale transducer learning rich visual features | |
CN116978057A (en) | Human body posture migration method and device in image, computer equipment and storage medium | |
CN115830707A (en) | Multi-view human behavior identification method based on hypergraph learning | |
CN117036368A (en) | Image data processing method, device, computer equipment and storage medium | |
CN114613011A (en) | Human body 3D (three-dimensional) bone behavior identification method based on graph attention convolutional neural network | |
Zhong et al. | Multimodal cooperative self‐attention network for action recognition | |
Jang et al. | Ghost graph convolutional network for skeleton-based action recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |