CN109086683B - Human hand posture regression method and system based on point cloud semantic enhancement - Google Patents

Human hand posture regression method and system based on point cloud semantic enhancement Download PDF

Info

Publication number
CN109086683B
CN109086683B CN201810758545.7A CN201810758545A CN109086683B CN 109086683 B CN109086683 B CN 109086683B CN 201810758545 A CN201810758545 A CN 201810758545A CN 109086683 B CN109086683 B CN 109086683B
Authority
CN
China
Prior art keywords
point cloud
hand
point
cloud data
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810758545.7A
Other languages
Chinese (zh)
Other versions
CN109086683A (en
Inventor
王贵锦
陈醒濠
杨华中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810758545.7A priority Critical patent/CN109086683B/en
Publication of CN109086683A publication Critical patent/CN109086683A/en
Application granted granted Critical
Publication of CN109086683B publication Critical patent/CN109086683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a hand posture regression method and system based on point cloud semantic enhancement, which are used for extracting point cloud characteristics of hand point cloud data, performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the semantically enhanced hand point cloud data, performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result, and performing geometric transformation on input data and output by utilizing network learning.

Description

Human hand posture regression method and system based on point cloud semantic enhancement
Technical Field
The invention relates to the technical field of computers, in particular to a human hand posture regression method and system based on point cloud semantic enhancement.
Background
In human-computer interaction based on vision, human hand posture estimation refers to accurate prediction of three-dimensional coordinate positions of skeleton nodes of human hands, and has wide application prospects in the fields of virtual reality, augmented reality, human-computer interaction and the like. The human hand posture estimation problem is a popular research point in the field of computer vision in the last decades.
Human hand posture estimation methods based on vision can be classified into 2 types, one is an appearance-based method; the method estimates the state of the human hand by establishing mapping from a two-dimensional image feature space to a three-dimensional human hand posture space through machine learning, has the advantages of easy realization of real-time tracking and the defects that dense learning samples are needed to ensure the precision and an efficient learning and searching algorithm is established in a huge image database; the other is a model-based method, projecting a three-dimensional model of a human hand into a two-dimensional image space, correcting the pose parameters estimated in the three-dimensional model through feature comparison and data estimation,
the advantage of model-based methods is that the estimation results are more accurate, but the performance of such methods depends on the model chosen, usually the depth image is input as a single-channel image into a two-dimensional Convolutional Neural Network (CNN), and then the hand pose is predicted. However, mapping from two-dimensional images to three-dimensional node coordinates is a highly non-linear problem, and the disparity in input and output space makes network learning very difficult. More recently, there have also been methods based on three-dimensional convolutional neural networks (3D CNN), which first convert the depth image to a voxel representation and then pose back and forth using 3D CNN. However, the 3D voxel requires a quantitative representation of continuous coordinate information, thereby introducing quantization errors that are detrimental to accurate human hand pose estimation. Meanwhile, the 3D CNN method occupies a large memory, and is particularly obvious when the 3D voxel resolution is high; the trained network is not robust to input geometric transformation, the precision is limited, in addition, most of the existing methods are based on heat map prediction or direct regression, and the posture regression performance is low.
Disclosure of Invention
The present invention provides a human hand pose regression method and system based on point cloud semantic enhancement that overcomes or at least partially solves the above-mentioned problems.
According to the first aspect of the invention, a human hand posture regression method based on point cloud semantic enhancement is provided, and comprises the following steps:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
According to a second aspect of the invention, a human hand posture regression device based on point cloud semantic enhancement is provided, which comprises:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions capable of performing the human hand pose regression method based on point cloud semantic enhancement as described above.
According to a second aspect of the present invention, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the above-described point cloud semantic enhancement based human hand pose regression method.
The invention provides a hand posture regression method and system based on point cloud semantic enhancement, which are used for extracting point cloud characteristics of hand point cloud data, performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the semantically enhanced hand point cloud data, performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result, performing geometric transformation on input data and output by utilizing network learning, enabling the hand posture estimation method to be more robust to the geometric transformation of the input data, and effectively fusing semantic information of a sub-network of the point-by-point classification of the input point cloud and a posture regression sub-network, so that the performance of hand posture estimation is further improved.
Drawings
FIG. 1 is a schematic diagram of a human hand posture regression method based on point cloud semantic enhancement according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network system of a human hand posture regression method based on point cloud semantic enhancement according to an embodiment of the invention;
FIG. 3 is a block diagram of a point cloud point-by-point classification subnetwork in accordance with an embodiment of the present invention;
FIG. 4 is a diagram of a pose regression subnetwork according to an embodiment of the present invention;
FIG. 5 is a diagram of a transformation learning subnetwork in accordance with an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of human hand posture regression equipment based on point cloud semantic enhancement according to an embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The human hand information obtained by the computer vision in the virtual operation human-computer interaction process has great advantages in the nature and cost of interaction, and is a main trend of future development. Because the information of the human hand cannot be directly acquired by the computer, the estimation of the posture parameters of the human hand becomes a basic work. The virtual hand in the scene can be driven only by obtaining accurate hand posture parameters, and the consistency of the reality and the virtues of human-computer interaction is ensured.
Existing human hand pose estimation methods require converting the depth image to a voxel representation and then pose back and forth using 3D CNN. However, the 3D voxel requires a quantitative representation of continuous coordinate information, thereby introducing quantization errors that are detrimental to accurate human hand pose estimation. Meanwhile, the 3D CNN method occupies a large amount of memory, especially when the resolution of the 3D voxel is high.
In order to solve the above problems, aiming at the problems that in the prior art, spatial transformation of input data is less considered, so that a trained network is not robust to input geometric transformation and limited in precision, and because prediction or direct regression is based on a heat map, how to improve posture regression by combining semantic segmentation information of the input data is not considered, the embodiment of the invention estimates the posture of a human hand based on point cloud, and provides a human hand posture regression method based on point cloud semantic enhancement, as shown in fig. 1, the method comprises the following steps:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
Specifically, in this embodiment, a sub-network is classified point by point through point cloud, hand point cloud data is used as input, a PointNet + + network is used to extract point cloud features of the point cloud, and finally, point by point classification is performed to obtain semantic segmentation information of the point cloud. The posture regression subnetwork is also used for inputting hand point cloud data and outputting the final result of hand posture estimation, and a network system established by the method according to the embodiment of the invention is shown in fig. 2.
In this embodiment, a method for performing geometric transformation on input data and output by using network learning makes the human hand posture estimation method more robust to the geometric transformation of the input data, and effectively fuses semantic segmentation information of point-by-point classification of an input point cloud and a posture regression subnetwork, so that the performance of human hand posture estimation is further improved.
Specifically, on the basis of the above embodiment, extracting point cloud features of the hand point cloud data, and performing point-by-point classification specifically includes:
constructing a point cloud point-by-point classification sub-network based on a plurality of point cloud abstraction layers, a plurality of point feature propagation layers and a plurality of layers of sensors, wherein the point cloud abstraction layers sample and group hand point cloud data, extract point cloud features of the grouped hand point cloud data through a PointNet layer, and transmit the point cloud features into the corresponding point feature propagation layers; the point feature propagation layer carries out interpolation operation on the input point cloud features and carries out serial connection and fusion with the corresponding bottom layer point-by-point features; the multilayer perceptron is used for generating labels classified point by point based on the bottom layer point by point features after the cascade connection and the fusion.
In this embodiment, as shown in fig. 3, a structural diagram of a point cloud point-by-point classification sub-network is shown, and the structure is based on a PointNet + + network and includes three point cloud abstraction layers and three point feature propagation layers in total. The point cloud abstraction layer comprises point cloud sampling and grouping operation, and point cloud feature extraction is carried out on the grouped point clouds by utilizing the PointNet layer. The point feature propagation layer firstly carries out interpolation operation on the input point cloud features and then carries out serial connection and fusion with the corresponding bottom layer point-by-point features. And finally, the network generates point cloud point-by-point classification labels by using a multilayer perceptron.
The data structure of a point cloud is a point set formed by point coordinates in a three-dimensional space, and the point cloud is essentially a long string of points (nx3 matrix, where n is the number of points). Geometrically, the order of the points does not affect its representation in space of the overall shape, e.g. the same point cloud may be represented by two completely different matrices.
In this embodiment, three-dimensional hand point cloud data is directly thrown into a network for training, the data size is small, fig. 2 is a network structure schematic diagram of a human hand posture regression method based on point cloud semantic enhancement in the embodiment of the present invention, hand point cloud data (nx3) including n points is input, a 3D spatial transformation learning subnetwork T-Net is used for estimating an input transformation matrix T-Net of 3x3 from original data, and a network structure schematic diagram of the human hand posture regression method based on point cloud semantic enhancement in the embodiment of the present invention is shown in the figure, where the inputinAnd acts on the original hand point cloud data to realize the alignment of the data.
FIG. 3 is a diagram of a point cloud point-by-point classification subnetwork. The structure is based on a PointNet + + network, a main network comprises three point cloud abstraction layers and three point feature propagation layers, the point cloud abstraction layers comprise point cloud sampling and grouping operation, and feature extraction is carried out on the grouped point clouds by utilizing the PointNet layer. The point feature propagation layer firstly carries out interpolation operation on the input features and then carries out serial connection and fusion with the corresponding bottom layer point-by-point features. And finally, the network generates point cloud point-by-point classification labels by using a multilayer perceptron.
FIG. 4 is a diagram of a pose regression subnetwork. The structure is based on a PointNet + + network, a main network comprises three point cloud abstract layers, and classification label information obtained by a point-by-point classification network is subjected to feature fusion with an attitude regression network on an input layer and an output layer.
On the basis of the above embodiments, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information further includes:
on the basis of a transformation learning sub-network, hand point cloud data is used as input, point cloud characteristics are extracted through three PointNet layers, and an input transformation matrix and an output transformation matrix of the point cloud characteristics are obtained on the basis of three full-connection layer learning.
As shown in FIG. 5, in this embodiment, an input transformation matrix T of 3 × 3 is first learned from the input point cloud using a transformation learning subnetwork (T-Net)inAnother transformation learning subnetwork (T-Net) learns an output transformation matrix T of 3 × 3 from the input point cloudout
T-Net is a sub-network of a predictive eigenspace transformation matrix, which learns a transformation matrix consistent with the eigenspace dimensions from input data, and then multiplies the original data by the transformation matrix to realize the transformation operation of the input eigenspace, so that each subsequent point has a relationship with each point in the input data. Through the data fusion, the gradual abstraction of the original point cloud data containing features is realized.
On the basis of the above embodiments, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information specifically includes:
multiplying the hand point cloud data by the input transformation matrix to obtain transformed hand point cloud data, and performing first concatenation and fusion on the transformed hand point cloud data and the semantic segmentation information; and extracting the point cloud characteristics of the hand point cloud data after the first concatenation and fusion, and performing second concatenation and fusion on the extracted point cloud characteristics and the semantic segmentation information.
In particular, the matrix T is transformed using the inputinWill TinCarrying out matrix multiplication on the hand point cloud data to obtain converted hand point cloud data, carrying out first serial fusion on the converted hand point cloud data and semantic segmentation information which is classified and learned by a point-by-point classification subnetwork of the point cloud, and then extracting point cloud characteristics of the hand point cloud data after the first serial fusion; and performing second concatenation and fusion on the extracted point cloud characteristics and the semantic segmentation information to obtainAnd predicting the hand posture.
Another transformation learning subnetwork learns an output transformation matrix T of 3 × 3 from the input point cloudoutOutput transformation matrix ToutAnd carrying out geometric transformation on the hand posture prediction result to obtain a final hand posture prediction result.
On the basis of the foregoing embodiments, in this embodiment, network training is further performed on the point-by-point cloud classification subnetwork and the posture regression subnetwork, and in the network training process, three loss functions need to be optimized simultaneously: point-by-point classification loss functions, attitude regression loss functions, and matrix reciprocity loss functions. The point-by-point classification loss function is a cross entropy loss function, the attitude regression loss function utilizes a smooth L1 loss function, and the matrix reciprocity loss function is defined as follows:
Lim=||TinTout-I||2
the loss function is used to limit the output transformation matrix ToutTransforming a matrix T for inputinThe inverse matrix I is an identity matrix, so that the network can keep the consistency of the geometric transformation of the input data and the output posture, and is insensitive to the geometric transformation of the input data, and the learning difficulty of the network is reduced.
In this embodiment, a human hand posture regression system based on point cloud semantic enhancement is provided based on the human hand posture regression method based on point cloud semantic enhancement of the above embodiments, as shown in fig. 2, including a point cloud point-by-point classification subnetwork and a posture regression subnetwork;
the point cloud point-by-point classification sub-network is used for extracting point cloud characteristics of the hand point cloud data and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
the gesture regression subnetwork is used for performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand gesture prediction result based on the hand point cloud data after the semantic enhancement, and performing geometric transformation on the hand gesture prediction result to obtain a hand gesture regression result.
The point cloud point-by-point classification sub-network takes point cloud data of a hand as input, utilizes a PointNet + + network to extract characteristics of the point cloud, and finally carries out point-by-point classification to obtain semantic segmentation information of the point cloudin,TinThe method comprises the steps of carrying out matrix multiplication on a data point cloud to obtain a transformed point cloud, carrying out serial fusion on the transformed point cloud and semantic information learned by a point-by-point classification sub-network of the point cloud, then extracting point cloud characteristics, carrying out serial fusion on the extracted point cloud characteristics and semantic segmentation information of the point cloud again to predict a hand posture result, and learning another transformation learning sub-network (T-Net) from an input point cloud to obtain an output transformation matrix T3 × 3out,ToutAnd carrying out geometric transformation on the hand posture estimation to obtain a final hand posture estimation result.
In this embodiment, as shown in fig. 3, a structural diagram of a point cloud point-by-point classification sub-network is shown, and the structure is based on a PointNet + + network and includes three point cloud abstraction layers and three point feature propagation layers in total. The point cloud abstraction layer comprises point cloud sampling and grouping operation, and point cloud feature extraction is carried out on the grouped point clouds by utilizing the PointNet layer. The point feature propagation layer firstly carries out interpolation operation on the input point cloud features and then carries out serial connection and fusion with the corresponding bottom layer point-by-point features. And finally, the network generates point cloud point-by-point classification labels by using a multilayer perceptron.
In this embodiment, fig. 4 is a diagram of a posture regression subnetwork structure. The structure is based on a PointNet + + network, a main network comprises three point cloud abstract layers, and classification label information obtained by point-by-point classification of sub-networks of the point cloud is subjected to feature fusion with an attitude regression network on an input layer and an output layer. T-Net is a sub-network of a predictive eigenspace transformation matrix, which learns a transformation matrix consistent with the eigenspace dimensions from input data, and then multiplies the original data by the transformation matrix to realize the transformation operation of the input eigenspace, so that each subsequent point has a relationship with each point in the input data. Through the data fusion, the gradual abstraction of the original point cloud data containing features is realized.
Fig. 6 is a block diagram illustrating a structure of a human hand posture regression device based on point cloud semantic enhancement according to an embodiment of the present application.
Referring to fig. 6, the human hand posture regression device based on point cloud semantic enhancement includes: a processor (processor)810, a memory (memory)830, a communication interface (communications interface)820, and a bus 840;
wherein the content of the first and second substances,
the processor 810, the memory 830 and the communication interface 820 complete communication with each other through the bus 840;
the communication interface 820 is used for information transmission between the test equipment and the communication equipment of the display device;
the processor 810 is configured to call program instructions in the memory 830 to perform the human hand pose regression method based on point cloud semantic enhancement provided by the above embodiments of the method, for example, including:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
The present embodiments disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, the computer is capable of performing a human hand pose regression method based on point cloud semantic enhancement as described above, for example comprising:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
Also provided in this embodiment is a non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the above-described point cloud semantic enhancement-based human hand pose regression method, for example, including:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
In summary, the embodiment of the present invention provides a hand posture regression method and system based on point cloud semantic enhancement, which extracts point cloud features of hand point cloud data, performs point-by-point classification to obtain semantic segmentation information of the hand point cloud data, performs semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtains a hand posture prediction result based on the semantically enhanced hand point cloud data, performs geometric transformation on the hand posture prediction result, and performs geometric transformation on input data and output by using network learning to obtain a hand posture regression result.
The above-described embodiments of the test equipment and the like of the display device are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A human hand posture regression method based on point cloud semantic enhancement is characterized by comprising the following steps:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result;
the semantic enhancement of the hand point cloud data based on the semantic segmentation information further comprises:
on the basis of a transformation learning sub-network, hand point cloud data is used as input, point cloud characteristics are extracted through three PointNet layers, and an input transformation matrix and an output transformation matrix of the point cloud characteristics are obtained on the basis of three full-connection layer learning.
2. The human hand posture regression method based on point cloud semantic enhancement as claimed in claim 1, wherein the point cloud features of the hand point cloud data are extracted and classified point by point, and the method specifically comprises the following steps:
constructing a point cloud point-by-point classification sub-network based on a plurality of point cloud abstraction layers, a plurality of point feature propagation layers and a plurality of layers of sensors, wherein the point cloud abstraction layers sample and group hand point cloud data, extract point cloud features of the grouped hand point cloud data through a PointNet layer, and transmit the point cloud features into the corresponding point feature propagation layers; the point feature propagation layer carries out interpolation operation on the input point cloud features and carries out serial connection and fusion with the corresponding bottom layer point-by-point features; the multilayer perceptron is used for generating labels classified point by point based on the bottom layer point by point features after the cascade connection and the fusion.
3. The human hand posture regression method based on point cloud semantic enhancement as claimed in claim 1, wherein the semantic enhancement is performed on the hand point cloud data based on the semantic segmentation information, specifically comprising:
multiplying the hand point cloud data by the input transformation matrix to obtain transformed hand point cloud data, and performing first concatenation and fusion on the transformed hand point cloud data and the semantic segmentation information; and extracting the point cloud characteristics of the hand point cloud data after the first concatenation and fusion, and performing second concatenation and fusion on the extracted point cloud characteristics and the semantic segmentation information.
4. The point cloud semantic enhancement-based human hand posture regression method according to claim 3, wherein the geometric transformation is performed on the hand posture prediction result, and the obtaining of the hand posture regression result specifically comprises:
and performing geometric transformation on the hand posture prediction result based on the output transformation matrix to obtain a hand posture regression result.
5. The human hand posture regression method based on point cloud semantic enhancement as claimed in claim 4, wherein after learning based on three full connection layers to obtain an output transformation matrix of point cloud features, the method further comprises:
and optimizing the input transformation matrix and the output transformation matrix based on a matrix reciprocity loss function so that the output transformation matrix is the inverse matrix of the input transformation matrix.
6. The point cloud semantics-based human hand pose regression method of claim 5, wherein the matrix reciprocity loss function is:
Lim=||TinTout-I||2
in the formula, TinFor input of transformation matrices, ToutTo output the transform matrix, I is the identity matrix.
7. A human hand posture regression system based on point cloud semantic enhancement is characterized by comprising a point cloud point-by-point classification sub-network and a posture regression sub-network;
the point cloud point-by-point classification sub-network is used for extracting point cloud characteristics of the hand point cloud data and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
the gesture regression subnetwork is used for performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand gesture prediction result based on the hand point cloud data after the semantic enhancement, and performing geometric transformation on the hand gesture prediction result to obtain a hand gesture regression result; the semantic enhancement of the hand point cloud data based on the semantic segmentation information further comprises: on the basis of a transformation learning sub-network, hand point cloud data is used as input, point cloud characteristics are extracted through three PointNet layers, and an input transformation matrix and an output transformation matrix of the point cloud characteristics are obtained on the basis of three full-connection layer learning.
8. A human hand posture regression device based on point cloud semantic enhancement is characterized by comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 6.
CN201810758545.7A 2018-07-11 2018-07-11 Human hand posture regression method and system based on point cloud semantic enhancement Active CN109086683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810758545.7A CN109086683B (en) 2018-07-11 2018-07-11 Human hand posture regression method and system based on point cloud semantic enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810758545.7A CN109086683B (en) 2018-07-11 2018-07-11 Human hand posture regression method and system based on point cloud semantic enhancement

Publications (2)

Publication Number Publication Date
CN109086683A CN109086683A (en) 2018-12-25
CN109086683B true CN109086683B (en) 2020-09-15

Family

ID=64837459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810758545.7A Active CN109086683B (en) 2018-07-11 2018-07-11 Human hand posture regression method and system based on point cloud semantic enhancement

Country Status (1)

Country Link
CN (1) CN109086683B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111771141B (en) 2019-01-30 2024-04-09 百度时代网络技术(北京)有限公司 LIDAR positioning for solution inference using 3D CNN network in autonomous vehicles
US11308639B2 (en) * 2019-03-12 2022-04-19 Volvo Car Corporation Tool and method for annotating a human pose in 3D point cloud data
CN110120047B (en) * 2019-04-04 2023-08-08 平安科技(深圳)有限公司 Image segmentation model training method, image segmentation method, device, equipment and medium
CN110059608B (en) * 2019-04-11 2021-07-06 腾讯科技(深圳)有限公司 Object detection method and device, electronic equipment and storage medium
CN111832358A (en) * 2019-04-19 2020-10-27 北京京东叁佰陆拾度电子商务有限公司 Point cloud semantic analysis method and device
CN110262939B (en) * 2019-05-14 2023-07-21 苏宁金融服务(上海)有限公司 Algorithm model operation monitoring method, device, computer equipment and storage medium
CN110135340A (en) * 2019-05-15 2019-08-16 中国科学技术大学 3D hand gestures estimation method based on cloud
CN110210431B (en) * 2019-06-06 2021-05-11 上海黑塞智能科技有限公司 Point cloud semantic labeling and optimization-based point cloud classification method
CN110555412B (en) * 2019-09-05 2023-05-16 深圳龙岗智能视听研究院 End-to-end human body gesture recognition method based on combination of RGB and point cloud
EP4035255A4 (en) 2019-09-23 2023-10-11 Canoo Technologies Inc. Fractional slot electric motors with coil elements having rectangular cross-sections
CN111161364B (en) * 2019-12-24 2022-11-18 东南大学 Real-time shape completion and attitude estimation method for single-view depth map
CN111325757B (en) * 2020-02-18 2022-12-23 西北工业大学 Point cloud identification and segmentation method based on Bayesian neural network
CN111368733B (en) * 2020-03-04 2022-12-06 电子科技大学 Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal
CN111428619B (en) * 2020-03-20 2022-08-05 电子科技大学 Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN112396655B (en) * 2020-11-18 2023-01-03 哈尔滨工程大学 Point cloud data-based ship target 6D pose estimation method
CN113095251B (en) * 2021-04-20 2022-05-27 清华大学深圳国际研究生院 Human body posture estimation method and system
CN113205531B (en) * 2021-04-30 2024-03-08 北京云圣智能科技有限责任公司 Three-dimensional point cloud segmentation method, device and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060078172A1 (en) * 2004-06-03 2006-04-13 Arizona Board Of Regents, A Body Corporate Of The State Of Arizona 3D face authentication and recognition based on bilateral symmetry analysis
US9934590B1 (en) * 2015-06-25 2018-04-03 The United States Of America As Represented By The Secretary Of The Air Force Tchebichef moment shape descriptor for partial point cloud characterization
CN108171217A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of three-dimension object detection method based on converged network
CN108268878A (en) * 2016-12-30 2018-07-10 乐视汽车(北京)有限公司 Three-dimensional full convolutional network realizes equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060078172A1 (en) * 2004-06-03 2006-04-13 Arizona Board Of Regents, A Body Corporate Of The State Of Arizona 3D face authentication and recognition based on bilateral symmetry analysis
US9934590B1 (en) * 2015-06-25 2018-04-03 The United States Of America As Represented By The Secretary Of The Air Force Tchebichef moment shape descriptor for partial point cloud characterization
CN108268878A (en) * 2016-12-30 2018-07-10 乐视汽车(北京)有限公司 Three-dimensional full convolutional network realizes equipment
CN108171217A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of three-dimension object detection method based on converged network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Hand PointNet:3D Hand Pose Estimation using Point Sets;Liuhao Ge et al;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20180623;8417-8426 *
Hand pose estimation through semi-supervised and weakly-supervised learning;Natalia Neverova et al;《Computer Vision and Image Understanding》;20171130;第164卷;56-67 *
PointNet:Deep Learning on Point Sets 3D Classification and Segmentation;Charles R.Qi et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20170726;77-85 *

Also Published As

Publication number Publication date
CN109086683A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109086683B (en) Human hand posture regression method and system based on point cloud semantic enhancement
JP6745328B2 (en) Method and apparatus for recovering point cloud data
JP7373554B2 (en) Cross-domain image transformation
WO2021143264A1 (en) Image processing method and apparatus, server and storage medium
CN111739005B (en) Image detection method, device, electronic equipment and storage medium
CN111462324A (en) Online spatiotemporal semantic fusion method and system
WO2022052782A1 (en) Image processing method and related device
CN115841534A (en) Method and device for controlling motion of virtual object
CN111539897A (en) Method and apparatus for generating image conversion model
Su et al. Monocular depth estimation using information exchange network
CN114897039A (en) Data processing method and related equipment
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN116342782A (en) Method and apparatus for generating avatar rendering model
CN115953468A (en) Method, device and equipment for estimating depth and self-movement track and storage medium
Xu Fast modelling algorithm for realistic three-dimensional human face for film and television animation
CN114792355A (en) Virtual image generation method and device, electronic equipment and storage medium
CN117745944A (en) Pre-training model determining method, device, equipment and storage medium
CN117094362A (en) Task processing method and related device
CN110197226B (en) Unsupervised image translation method and system
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
CN116402914A (en) Method, device and product for determining stylized image generation model
CN116977547A (en) Three-dimensional face reconstruction method and device, electronic equipment and storage medium
CN113240780B (en) Method and device for generating animation
Bhattacharyya et al. Efficient unsupervised monocular depth estimation using attention guided generative adversarial network
CN114913330A (en) Point cloud component segmentation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant