CN109086683B - Human hand posture regression method and system based on point cloud semantic enhancement - Google Patents
Human hand posture regression method and system based on point cloud semantic enhancement Download PDFInfo
- Publication number
- CN109086683B CN109086683B CN201810758545.7A CN201810758545A CN109086683B CN 109086683 B CN109086683 B CN 109086683B CN 201810758545 A CN201810758545 A CN 201810758545A CN 109086683 B CN109086683 B CN 109086683B
- Authority
- CN
- China
- Prior art keywords
- point cloud
- hand
- point
- cloud data
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000001131 transforming Effects 0.000 claims abstract description 65
- 230000011218 segmentation Effects 0.000 claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims description 41
- 230000004927 fusion Effects 0.000 claims description 22
- 230000000875 corresponding Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000001627 detrimental Effects 0.000 description 2
- 238000000034 method Methods 0.000 description 2
- 230000001537 neural Effects 0.000 description 2
- 210000002356 Skeleton Anatomy 0.000 description 1
- 230000003190 augmentative Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006011 modification reaction Methods 0.000 description 1
- 230000003287 optical Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
Abstract
The embodiment of the invention provides a hand posture regression method and system based on point cloud semantic enhancement, which are used for extracting point cloud characteristics of hand point cloud data, performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the semantically enhanced hand point cloud data, performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result, and performing geometric transformation on input data and output by utilizing network learning.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a human hand posture regression method and system based on point cloud semantic enhancement.
Background
In human-computer interaction based on vision, human hand posture estimation refers to accurate prediction of three-dimensional coordinate positions of skeleton nodes of human hands, and has wide application prospects in the fields of virtual reality, augmented reality, human-computer interaction and the like. The human hand posture estimation problem is a popular research point in the field of computer vision in the last decades.
Human hand posture estimation methods based on vision can be classified into 2 types, one is an appearance-based method; the method estimates the state of the human hand by establishing mapping from a two-dimensional image feature space to a three-dimensional human hand posture space through machine learning, has the advantages of easy realization of real-time tracking and the defects that dense learning samples are needed to ensure the precision and an efficient learning and searching algorithm is established in a huge image database; the other is a model-based method, projecting a three-dimensional model of a human hand into a two-dimensional image space, correcting the pose parameters estimated in the three-dimensional model through feature comparison and data estimation,
the advantage of model-based methods is that the estimation results are more accurate, but the performance of such methods depends on the model chosen, usually the depth image is input as a single-channel image into a two-dimensional Convolutional Neural Network (CNN), and then the hand pose is predicted. However, mapping from two-dimensional images to three-dimensional node coordinates is a highly non-linear problem, and the disparity in input and output space makes network learning very difficult. More recently, there have also been methods based on three-dimensional convolutional neural networks (3D CNN), which first convert the depth image to a voxel representation and then pose back and forth using 3D CNN. However, the 3D voxel requires a quantitative representation of continuous coordinate information, thereby introducing quantization errors that are detrimental to accurate human hand pose estimation. Meanwhile, the 3D CNN method occupies a large memory, and is particularly obvious when the 3D voxel resolution is high; the trained network is not robust to input geometric transformation, the precision is limited, in addition, most of the existing methods are based on heat map prediction or direct regression, and the posture regression performance is low.
Disclosure of Invention
The present invention provides a human hand pose regression method and system based on point cloud semantic enhancement that overcomes or at least partially solves the above-mentioned problems.
According to the first aspect of the invention, a human hand posture regression method based on point cloud semantic enhancement is provided, and comprises the following steps:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
According to a second aspect of the invention, a human hand posture regression device based on point cloud semantic enhancement is provided, which comprises:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions capable of performing the human hand pose regression method based on point cloud semantic enhancement as described above.
According to a second aspect of the present invention, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the above-described point cloud semantic enhancement based human hand pose regression method.
The invention provides a hand posture regression method and system based on point cloud semantic enhancement, which are used for extracting point cloud characteristics of hand point cloud data, performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the semantically enhanced hand point cloud data, performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result, performing geometric transformation on input data and output by utilizing network learning, enabling the hand posture estimation method to be more robust to the geometric transformation of the input data, and effectively fusing semantic information of a sub-network of the point-by-point classification of the input point cloud and a posture regression sub-network, so that the performance of hand posture estimation is further improved.
Drawings
FIG. 1 is a schematic diagram of a human hand posture regression method based on point cloud semantic enhancement according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network system of a human hand posture regression method based on point cloud semantic enhancement according to an embodiment of the invention;
FIG. 3 is a block diagram of a point cloud point-by-point classification subnetwork in accordance with an embodiment of the present invention;
FIG. 4 is a diagram of a pose regression subnetwork according to an embodiment of the present invention;
FIG. 5 is a diagram of a transformation learning subnetwork in accordance with an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of human hand posture regression equipment based on point cloud semantic enhancement according to an embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The human hand information obtained by the computer vision in the virtual operation human-computer interaction process has great advantages in the nature and cost of interaction, and is a main trend of future development. Because the information of the human hand cannot be directly acquired by the computer, the estimation of the posture parameters of the human hand becomes a basic work. The virtual hand in the scene can be driven only by obtaining accurate hand posture parameters, and the consistency of the reality and the virtues of human-computer interaction is ensured.
Existing human hand pose estimation methods require converting the depth image to a voxel representation and then pose back and forth using 3D CNN. However, the 3D voxel requires a quantitative representation of continuous coordinate information, thereby introducing quantization errors that are detrimental to accurate human hand pose estimation. Meanwhile, the 3D CNN method occupies a large amount of memory, especially when the resolution of the 3D voxel is high.
In order to solve the above problems, aiming at the problems that in the prior art, spatial transformation of input data is less considered, so that a trained network is not robust to input geometric transformation and limited in precision, and because prediction or direct regression is based on a heat map, how to improve posture regression by combining semantic segmentation information of the input data is not considered, the embodiment of the invention estimates the posture of a human hand based on point cloud, and provides a human hand posture regression method based on point cloud semantic enhancement, as shown in fig. 1, the method comprises the following steps:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
Specifically, in this embodiment, a sub-network is classified point by point through point cloud, hand point cloud data is used as input, a PointNet + + network is used to extract point cloud features of the point cloud, and finally, point by point classification is performed to obtain semantic segmentation information of the point cloud. The posture regression subnetwork is also used for inputting hand point cloud data and outputting the final result of hand posture estimation, and a network system established by the method according to the embodiment of the invention is shown in fig. 2.
In this embodiment, a method for performing geometric transformation on input data and output by using network learning makes the human hand posture estimation method more robust to the geometric transformation of the input data, and effectively fuses semantic segmentation information of point-by-point classification of an input point cloud and a posture regression subnetwork, so that the performance of human hand posture estimation is further improved.
Specifically, on the basis of the above embodiment, extracting point cloud features of the hand point cloud data, and performing point-by-point classification specifically includes:
constructing a point cloud point-by-point classification sub-network based on a plurality of point cloud abstraction layers, a plurality of point feature propagation layers and a plurality of layers of sensors, wherein the point cloud abstraction layers sample and group hand point cloud data, extract point cloud features of the grouped hand point cloud data through a PointNet layer, and transmit the point cloud features into the corresponding point feature propagation layers; the point feature propagation layer carries out interpolation operation on the input point cloud features and carries out serial connection and fusion with the corresponding bottom layer point-by-point features; the multilayer perceptron is used for generating labels classified point by point based on the bottom layer point by point features after the cascade connection and the fusion.
In this embodiment, as shown in fig. 3, a structural diagram of a point cloud point-by-point classification sub-network is shown, and the structure is based on a PointNet + + network and includes three point cloud abstraction layers and three point feature propagation layers in total. The point cloud abstraction layer comprises point cloud sampling and grouping operation, and point cloud feature extraction is carried out on the grouped point clouds by utilizing the PointNet layer. The point feature propagation layer firstly carries out interpolation operation on the input point cloud features and then carries out serial connection and fusion with the corresponding bottom layer point-by-point features. And finally, the network generates point cloud point-by-point classification labels by using a multilayer perceptron.
The data structure of a point cloud is a point set formed by point coordinates in a three-dimensional space, and the point cloud is essentially a long string of points (nx3 matrix, where n is the number of points). Geometrically, the order of the points does not affect its representation in space of the overall shape, e.g. the same point cloud may be represented by two completely different matrices.
In this embodiment, three-dimensional hand point cloud data is directly thrown into a network for training, the data size is small, fig. 2 is a network structure schematic diagram of a human hand posture regression method based on point cloud semantic enhancement in the embodiment of the present invention, hand point cloud data (nx3) including n points is input, a 3D spatial transformation learning subnetwork T-Net is used for estimating an input transformation matrix T-Net of 3x3 from original data, and a network structure schematic diagram of the human hand posture regression method based on point cloud semantic enhancement in the embodiment of the present invention is shown in the figure, where the inputinAnd acts on the original hand point cloud data to realize the alignment of the data.
FIG. 3 is a diagram of a point cloud point-by-point classification subnetwork. The structure is based on a PointNet + + network, a main network comprises three point cloud abstraction layers and three point feature propagation layers, the point cloud abstraction layers comprise point cloud sampling and grouping operation, and feature extraction is carried out on the grouped point clouds by utilizing the PointNet layer. The point feature propagation layer firstly carries out interpolation operation on the input features and then carries out serial connection and fusion with the corresponding bottom layer point-by-point features. And finally, the network generates point cloud point-by-point classification labels by using a multilayer perceptron.
FIG. 4 is a diagram of a pose regression subnetwork. The structure is based on a PointNet + + network, a main network comprises three point cloud abstract layers, and classification label information obtained by a point-by-point classification network is subjected to feature fusion with an attitude regression network on an input layer and an output layer.
On the basis of the above embodiments, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information further includes:
on the basis of a transformation learning sub-network, hand point cloud data is used as input, point cloud characteristics are extracted through three PointNet layers, and an input transformation matrix and an output transformation matrix of the point cloud characteristics are obtained on the basis of three full-connection layer learning.
As shown in FIG. 5, in this embodiment, an input transformation matrix T of 3 × 3 is first learned from the input point cloud using a transformation learning subnetwork (T-Net)inAnother transformation learning subnetwork (T-Net) learns an output transformation matrix T of 3 × 3 from the input point cloudout。
T-Net is a sub-network of a predictive eigenspace transformation matrix, which learns a transformation matrix consistent with the eigenspace dimensions from input data, and then multiplies the original data by the transformation matrix to realize the transformation operation of the input eigenspace, so that each subsequent point has a relationship with each point in the input data. Through the data fusion, the gradual abstraction of the original point cloud data containing features is realized.
On the basis of the above embodiments, performing semantic enhancement on the hand point cloud data based on the semantic segmentation information specifically includes:
multiplying the hand point cloud data by the input transformation matrix to obtain transformed hand point cloud data, and performing first concatenation and fusion on the transformed hand point cloud data and the semantic segmentation information; and extracting the point cloud characteristics of the hand point cloud data after the first concatenation and fusion, and performing second concatenation and fusion on the extracted point cloud characteristics and the semantic segmentation information.
In particular, the matrix T is transformed using the inputinWill TinCarrying out matrix multiplication on the hand point cloud data to obtain converted hand point cloud data, carrying out first serial fusion on the converted hand point cloud data and semantic segmentation information which is classified and learned by a point-by-point classification subnetwork of the point cloud, and then extracting point cloud characteristics of the hand point cloud data after the first serial fusion; and performing second concatenation and fusion on the extracted point cloud characteristics and the semantic segmentation information to obtainAnd predicting the hand posture.
Another transformation learning subnetwork learns an output transformation matrix T of 3 × 3 from the input point cloudoutOutput transformation matrix ToutAnd carrying out geometric transformation on the hand posture prediction result to obtain a final hand posture prediction result.
On the basis of the foregoing embodiments, in this embodiment, network training is further performed on the point-by-point cloud classification subnetwork and the posture regression subnetwork, and in the network training process, three loss functions need to be optimized simultaneously: point-by-point classification loss functions, attitude regression loss functions, and matrix reciprocity loss functions. The point-by-point classification loss function is a cross entropy loss function, the attitude regression loss function utilizes a smooth L1 loss function, and the matrix reciprocity loss function is defined as follows:
Lim=||TinTout-I||2
the loss function is used to limit the output transformation matrix ToutTransforming a matrix T for inputinThe inverse matrix I is an identity matrix, so that the network can keep the consistency of the geometric transformation of the input data and the output posture, and is insensitive to the geometric transformation of the input data, and the learning difficulty of the network is reduced.
In this embodiment, a human hand posture regression system based on point cloud semantic enhancement is provided based on the human hand posture regression method based on point cloud semantic enhancement of the above embodiments, as shown in fig. 2, including a point cloud point-by-point classification subnetwork and a posture regression subnetwork;
the point cloud point-by-point classification sub-network is used for extracting point cloud characteristics of the hand point cloud data and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
the gesture regression subnetwork is used for performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand gesture prediction result based on the hand point cloud data after the semantic enhancement, and performing geometric transformation on the hand gesture prediction result to obtain a hand gesture regression result.
The point cloud point-by-point classification sub-network takes point cloud data of a hand as input, utilizes a PointNet + + network to extract characteristics of the point cloud, and finally carries out point-by-point classification to obtain semantic segmentation information of the point cloudin,TinThe method comprises the steps of carrying out matrix multiplication on a data point cloud to obtain a transformed point cloud, carrying out serial fusion on the transformed point cloud and semantic information learned by a point-by-point classification sub-network of the point cloud, then extracting point cloud characteristics, carrying out serial fusion on the extracted point cloud characteristics and semantic segmentation information of the point cloud again to predict a hand posture result, and learning another transformation learning sub-network (T-Net) from an input point cloud to obtain an output transformation matrix T3 × 3out,ToutAnd carrying out geometric transformation on the hand posture estimation to obtain a final hand posture estimation result.
In this embodiment, as shown in fig. 3, a structural diagram of a point cloud point-by-point classification sub-network is shown, and the structure is based on a PointNet + + network and includes three point cloud abstraction layers and three point feature propagation layers in total. The point cloud abstraction layer comprises point cloud sampling and grouping operation, and point cloud feature extraction is carried out on the grouped point clouds by utilizing the PointNet layer. The point feature propagation layer firstly carries out interpolation operation on the input point cloud features and then carries out serial connection and fusion with the corresponding bottom layer point-by-point features. And finally, the network generates point cloud point-by-point classification labels by using a multilayer perceptron.
In this embodiment, fig. 4 is a diagram of a posture regression subnetwork structure. The structure is based on a PointNet + + network, a main network comprises three point cloud abstract layers, and classification label information obtained by point-by-point classification of sub-networks of the point cloud is subjected to feature fusion with an attitude regression network on an input layer and an output layer. T-Net is a sub-network of a predictive eigenspace transformation matrix, which learns a transformation matrix consistent with the eigenspace dimensions from input data, and then multiplies the original data by the transformation matrix to realize the transformation operation of the input eigenspace, so that each subsequent point has a relationship with each point in the input data. Through the data fusion, the gradual abstraction of the original point cloud data containing features is realized.
Fig. 6 is a block diagram illustrating a structure of a human hand posture regression device based on point cloud semantic enhancement according to an embodiment of the present application.
Referring to fig. 6, the human hand posture regression device based on point cloud semantic enhancement includes: a processor (processor)810, a memory (memory)830, a communication interface (communications interface)820, and a bus 840;
wherein the content of the first and second substances,
the processor 810, the memory 830 and the communication interface 820 complete communication with each other through the bus 840;
the communication interface 820 is used for information transmission between the test equipment and the communication equipment of the display device;
the processor 810 is configured to call program instructions in the memory 830 to perform the human hand pose regression method based on point cloud semantic enhancement provided by the above embodiments of the method, for example, including:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
The present embodiments disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, the computer is capable of performing a human hand pose regression method based on point cloud semantic enhancement as described above, for example comprising:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
Also provided in this embodiment is a non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the above-described point cloud semantic enhancement-based human hand pose regression method, for example, including:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result.
In summary, the embodiment of the present invention provides a hand posture regression method and system based on point cloud semantic enhancement, which extracts point cloud features of hand point cloud data, performs point-by-point classification to obtain semantic segmentation information of the hand point cloud data, performs semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtains a hand posture prediction result based on the semantically enhanced hand point cloud data, performs geometric transformation on the hand posture prediction result, and performs geometric transformation on input data and output by using network learning to obtain a hand posture regression result.
The above-described embodiments of the test equipment and the like of the display device are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (9)
1. A human hand posture regression method based on point cloud semantic enhancement is characterized by comprising the following steps:
extracting point cloud characteristics of the hand point cloud data, and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand posture prediction result based on the hand point cloud data after semantic enhancement, and performing geometric transformation on the hand posture prediction result to obtain a hand posture regression result;
the semantic enhancement of the hand point cloud data based on the semantic segmentation information further comprises:
on the basis of a transformation learning sub-network, hand point cloud data is used as input, point cloud characteristics are extracted through three PointNet layers, and an input transformation matrix and an output transformation matrix of the point cloud characteristics are obtained on the basis of three full-connection layer learning.
2. The human hand posture regression method based on point cloud semantic enhancement as claimed in claim 1, wherein the point cloud features of the hand point cloud data are extracted and classified point by point, and the method specifically comprises the following steps:
constructing a point cloud point-by-point classification sub-network based on a plurality of point cloud abstraction layers, a plurality of point feature propagation layers and a plurality of layers of sensors, wherein the point cloud abstraction layers sample and group hand point cloud data, extract point cloud features of the grouped hand point cloud data through a PointNet layer, and transmit the point cloud features into the corresponding point feature propagation layers; the point feature propagation layer carries out interpolation operation on the input point cloud features and carries out serial connection and fusion with the corresponding bottom layer point-by-point features; the multilayer perceptron is used for generating labels classified point by point based on the bottom layer point by point features after the cascade connection and the fusion.
3. The human hand posture regression method based on point cloud semantic enhancement as claimed in claim 1, wherein the semantic enhancement is performed on the hand point cloud data based on the semantic segmentation information, specifically comprising:
multiplying the hand point cloud data by the input transformation matrix to obtain transformed hand point cloud data, and performing first concatenation and fusion on the transformed hand point cloud data and the semantic segmentation information; and extracting the point cloud characteristics of the hand point cloud data after the first concatenation and fusion, and performing second concatenation and fusion on the extracted point cloud characteristics and the semantic segmentation information.
4. The point cloud semantic enhancement-based human hand posture regression method according to claim 3, wherein the geometric transformation is performed on the hand posture prediction result, and the obtaining of the hand posture regression result specifically comprises:
and performing geometric transformation on the hand posture prediction result based on the output transformation matrix to obtain a hand posture regression result.
5. The human hand posture regression method based on point cloud semantic enhancement as claimed in claim 4, wherein after learning based on three full connection layers to obtain an output transformation matrix of point cloud features, the method further comprises:
and optimizing the input transformation matrix and the output transformation matrix based on a matrix reciprocity loss function so that the output transformation matrix is the inverse matrix of the input transformation matrix.
6. The point cloud semantics-based human hand pose regression method of claim 5, wherein the matrix reciprocity loss function is:
Lim=||TinTout-I||2
in the formula, TinFor input of transformation matrices, ToutTo output the transform matrix, I is the identity matrix.
7. A human hand posture regression system based on point cloud semantic enhancement is characterized by comprising a point cloud point-by-point classification sub-network and a posture regression sub-network;
the point cloud point-by-point classification sub-network is used for extracting point cloud characteristics of the hand point cloud data and performing point-by-point classification to obtain semantic segmentation information of the hand point cloud data;
the gesture regression subnetwork is used for performing semantic enhancement on the hand point cloud data based on the semantic segmentation information, obtaining a hand gesture prediction result based on the hand point cloud data after the semantic enhancement, and performing geometric transformation on the hand gesture prediction result to obtain a hand gesture regression result; the semantic enhancement of the hand point cloud data based on the semantic segmentation information further comprises: on the basis of a transformation learning sub-network, hand point cloud data is used as input, point cloud characteristics are extracted through three PointNet layers, and an input transformation matrix and an output transformation matrix of the point cloud characteristics are obtained on the basis of three full-connection layer learning.
8. A human hand posture regression device based on point cloud semantic enhancement is characterized by comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810758545.7A CN109086683B (en) | 2018-07-11 | 2018-07-11 | Human hand posture regression method and system based on point cloud semantic enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810758545.7A CN109086683B (en) | 2018-07-11 | 2018-07-11 | Human hand posture regression method and system based on point cloud semantic enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109086683A CN109086683A (en) | 2018-12-25 |
CN109086683B true CN109086683B (en) | 2020-09-15 |
Family
ID=64837459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810758545.7A Active CN109086683B (en) | 2018-07-11 | 2018-07-11 | Human hand posture regression method and system based on point cloud semantic enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086683B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111771141A (en) * | 2019-01-30 | 2020-10-13 | 百度时代网络技术(北京)有限公司 | LIDAR positioning in autonomous vehicles using 3D CNN networks for solution inference |
CN110059608B (en) * | 2019-04-11 | 2021-07-06 | 腾讯科技(深圳)有限公司 | Object detection method and device, electronic equipment and storage medium |
CN110262939A (en) * | 2019-05-14 | 2019-09-20 | 苏宁金融服务(上海)有限公司 | Algorithm model operation and monitoring method, device, computer equipment and storage medium |
CN110135340A (en) * | 2019-05-15 | 2019-08-16 | 中国科学技术大学 | 3D hand gestures estimation method based on cloud |
CN110210431B (en) * | 2019-06-06 | 2021-05-11 | 上海黑塞智能科技有限公司 | Point cloud semantic labeling and optimization-based point cloud classification method |
CN111161364B (en) * | 2019-12-24 | 2022-11-18 | 东南大学 | Real-time shape completion and attitude estimation method for single-view depth map |
CN111325757B (en) * | 2020-02-18 | 2022-12-23 | 西北工业大学 | Point cloud identification and segmentation method based on Bayesian neural network |
CN111368733B (en) * | 2020-03-04 | 2022-12-06 | 电子科技大学 | Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal |
CN111428619B (en) * | 2020-03-20 | 2022-08-05 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels |
CN112396655B (en) * | 2020-11-18 | 2023-01-03 | 哈尔滨工程大学 | Point cloud data-based ship target 6D pose estimation method |
CN113095251B (en) * | 2021-04-20 | 2022-05-27 | 清华大学深圳国际研究生院 | Human body posture estimation method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060078172A1 (en) * | 2004-06-03 | 2006-04-13 | Arizona Board Of Regents, A Body Corporate Of The State Of Arizona | 3D face authentication and recognition based on bilateral symmetry analysis |
US9934590B1 (en) * | 2015-06-25 | 2018-04-03 | The United States Of America As Represented By The Secretary Of The Air Force | Tchebichef moment shape descriptor for partial point cloud characterization |
CN108171217A (en) * | 2018-01-29 | 2018-06-15 | 深圳市唯特视科技有限公司 | A kind of three-dimension object detection method based on converged network |
CN108268878A (en) * | 2016-12-30 | 2018-07-10 | 乐视汽车(北京)有限公司 | Three-dimensional full convolutional network realizes equipment |
-
2018
- 2018-07-11 CN CN201810758545.7A patent/CN109086683B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060078172A1 (en) * | 2004-06-03 | 2006-04-13 | Arizona Board Of Regents, A Body Corporate Of The State Of Arizona | 3D face authentication and recognition based on bilateral symmetry analysis |
US9934590B1 (en) * | 2015-06-25 | 2018-04-03 | The United States Of America As Represented By The Secretary Of The Air Force | Tchebichef moment shape descriptor for partial point cloud characterization |
CN108268878A (en) * | 2016-12-30 | 2018-07-10 | 乐视汽车(北京)有限公司 | Three-dimensional full convolutional network realizes equipment |
CN108171217A (en) * | 2018-01-29 | 2018-06-15 | 深圳市唯特视科技有限公司 | A kind of three-dimension object detection method based on converged network |
Non-Patent Citations (3)
Title |
---|
Hand PointNet:3D Hand Pose Estimation using Point Sets;Liuhao Ge et al;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20180623;8417-8426 * |
Hand pose estimation through semi-supervised and weakly-supervised learning;Natalia Neverova et al;《Computer Vision and Image Understanding》;20171130;第164卷;56-67 * |
PointNet:Deep Learning on Point Sets 3D Classification and Segmentation;Charles R.Qi et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20170726;77-85 * |
Also Published As
Publication number | Publication date |
---|---|
CN109086683A (en) | 2018-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086683B (en) | Human hand posture regression method and system based on point cloud semantic enhancement | |
JP6745328B2 (en) | Method and apparatus for recovering point cloud data | |
CA2911522C (en) | Estimating depth from a single image | |
WO2021143264A1 (en) | Image processing method and apparatus, server and storage medium | |
CN111462324B (en) | Online spatiotemporal semantic fusion method and system | |
CN114255238A (en) | Three-dimensional point cloud scene segmentation method and system fusing image features | |
Zhao et al. | Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint | |
Su et al. | Monocular depth estimation using information exchange network | |
Lu et al. | 3D real-time human reconstruction with a single RGBD camera | |
CN115391310A (en) | Data migration method, device, equipment and storage medium | |
WO2022052782A1 (en) | Image processing method and related device | |
CN114821228A (en) | Depth image output model training method, depth image obtaining method and device | |
Bhattacharyya et al. | Efficient unsupervised monocular depth estimation using attention guided generative adversarial network | |
Chang | DR‐Net: denoising and reconstruction network for 3D human pose estimation from monocular RGB videos | |
CN110197226B (en) | Unsupervised image translation method and system | |
CN111739005A (en) | Image detection method, image detection device, electronic equipment and storage medium | |
CN114579806B (en) | Video detection method, storage medium and processor | |
CN114820908B (en) | Virtual image generation method and device, electronic equipment and storage medium | |
CN115866229B (en) | Viewing angle conversion method, device, equipment and medium for multi-viewing angle image | |
CN115841534A (en) | Method and device for controlling motion of virtual object | |
Meng et al. | Residual Transformer Network for 3D Objects Classification | |
JP2014149788A (en) | Object area boundary estimation device, object area boundary estimation method, and object area boundary estimation program | |
CN111539897A (en) | Method and apparatus for generating image conversion model | |
CN112991501A (en) | Data processing method and device and computer readable storage medium | |
CN115187665A (en) | Point cloud data processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |