CN110555412B - End-to-end human body gesture recognition method based on combination of RGB and point cloud - Google Patents

End-to-end human body gesture recognition method based on combination of RGB and point cloud Download PDF

Info

Publication number
CN110555412B
CN110555412B CN201910836867.3A CN201910836867A CN110555412B CN 110555412 B CN110555412 B CN 110555412B CN 201910836867 A CN201910836867 A CN 201910836867A CN 110555412 B CN110555412 B CN 110555412B
Authority
CN
China
Prior art keywords
human body
information
point cloud
rgb
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910836867.3A
Other languages
Chinese (zh)
Other versions
CN110555412A (en
Inventor
张世雄
李楠楠
赵翼飞
李若尘
李革
安欣赏
张伟民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Original Assignee
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Instritute Of Intelligent Video Audio Technology Longgang Shenzhen filed Critical Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority to CN201910836867.3A priority Critical patent/CN110555412B/en
Publication of CN110555412A publication Critical patent/CN110555412A/en
Application granted granted Critical
Publication of CN110555412B publication Critical patent/CN110555412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

An end-to-end human body gesture recognition method based on the combination of RGB and point cloud comprises the following steps: 1): preprocessing RGB information and point cloud information; 2): extracting two-dimensional (2D) skeleton information of a human body by using a front-end network; 3): three-dimensional (3D) skeleton information of a human body is extracted using a three-dimensional (3D) network. The end-to-end human body gesture recognition method based on the combination of RGB and point cloud provided by the invention can effectively extract an accurate 3D model of a human body from human body data acquired by RGB-D equipment. The method solves a series of problems of great ambiguity of the 2D gesture, insufficient 3D gesture precision, lack of 3D data sets and the like caused by gesture appearance change, gesture freedom degree diversity, similar gestures and self-shielding in gesture recognition.

Description

End-to-end human body gesture recognition method based on combination of RGB and point cloud
Technical Field
The invention relates to a method for recognizing the gesture of key points of a human body through an RGB-D camera, in particular to an end-to-end human body gesture recognition method based on the combination of RGB and point cloud.
Background
Key point detection of human body gestures is an important field of computer vision research, and research results are mainly used for a series of intelligent applications such as human-computer interaction of new generation, human-computer interaction of Virtual Reality (VR) and Augmented Reality (AR), behavior recognition analysis and the like. The traditional gesture recognition algorithm generally adopts a wearable acceleration sensor to recognize and detect the human gesture, so that the cost is high, and the wearing is complicated and active matching is needed. The early detection of human body posture based on video is mainly based on template matching of hand-painted features, the extraction mode of the hand-painted features is complex in design, low in reliability, easy to be interfered by the outside and poor in recognition effect on complex actions. Meanwhile, interference factors such as camera visual angles, illumination, shielding and the like commonly exist in a real scene, and the conventional method often has the problem that the recognition accuracy is not high or the recognition is not performed in the real scene. With the continuous maturity of the application of deep learning in the field of computer vision, human body gesture recognition also increasingly adopts a deep learning method. On the other hand, the hardware Of the acquisition device is continuously developing more and more three-dimensional (3D) acquisition devices, which can well supplement the defects Of two-dimensional (2D) projection, including rotation, shielding, similarity and the like Of human body gestures, and three main schemes, namely a 3D Structured Light scheme (Structured Light), a ToF 3D scheme (Time Of Flight) and a binocular Stereo imaging scheme (Stereo System), are available on the market at present. According to the three acquisition schemes, point cloud images with depth information can be acquired, and according to different acquired data, human body gesture recognition can be divided into gesture recognition with depth data, namely gesture recognition based on three-dimensional (3D) point cloud, and human body gesture recognition based on common image data (namely RGB data), namely gesture recognition based on 2D images.
Because three-dimensional (3D) point cloud accuracy is low, more noise is contained, the data volume is increased greatly, the dimension is larger, one dimension is more than that of a two-dimensional image, the calculation is complex, and the calculation amount is large. Meanwhile, sparsity of the three-dimensional point cloud is also considered, so that aiming at how to improve the calculation efficiency in the reconstruction based on the voxels, excessive memory waste in unoccupied parts is avoided, the reconstruction resolution is improved, the network structure is improved so as to improve the reconstruction effect, and more details can be recovered.
The 2D image contains rich color information, is clear and contains more detail information. The noise is small, the acquisition equipment is mature, but the space depth information is insufficient, and ambiguity is easy to cause. The human body posture of the 2D image commonly has a certain ill-posed problem. For a traditional three-dimensional mapping of 2D image body poses, one image body pose may often correspond to multiple different three-dimensional body poses. From a statistical point of view, reasonable predictions of the input image form a distribution. Reflected in the training set, the pose of two human body poses that look similar in the image may be quite different in practice.
Disclosure of Invention
The invention provides an end-to-end human body gesture recognition method based on the combination of RGB and point cloud, which can effectively extract an accurate 3D model of a human body from human body data acquired by RGB-D equipment. The method solves a series of problems of great ambiguity of the 2D gesture, insufficient 3D gesture precision, lack of 3D data sets and the like caused by gesture appearance change, gesture freedom degree diversity, similar gestures and self-shielding in gesture recognition.
The technical scheme provided by the invention is as follows:
the invention discloses an end-to-end human body gesture recognition method based on combination of RGB and point cloud, which comprises the following steps: step 1): preprocessing RGB information and point cloud information; step 2): extracting two-dimensional (2D) skeleton information of a human body by using a front-end network; step 3): three-dimensional (3D) skeleton information of a human body is extracted using a three-dimensional (3D) network.
In the method for recognizing the end-to-end human body gesture based on the combination of RGB and point cloud, before preprocessing, an RGB-D camera is used as acquisition input of signals, and the acquired signals are divided into RGB information and point cloud information.
In the method of the end-to-end human body gesture recognition method based on the combination of RGB and point cloud, in the step 1), the RGB information and the point cloud information are respectively subjected to filtering and denoising preprocessing, and alignment processing is performed.
In the method of the end-to-end human body gesture recognition method based on the combination of RGB and point cloud, in the step 1), a contour feature mapping method is used, the point cloud image is used as a coordinate reference to extract each edge salient feature, the feature points are mapped one by one, the offset { p1, p2, p3, & gtof & ltDEG & gtof & lt/DEG & gtof each feature point is calculated, and finally, calculating the average offset p of all the characteristic points, and projecting RGB to an affine space for conversion alignment.
In the method for recognizing the end-to-end human body gesture based on the combination of RGB and point cloud, in the step 2), the preprocessed RGB information is input into a pre-network trained in advance to extract human body two-dimensional (2D) skeleton information, the extracted human body two-dimensional (2D) skeleton information (2D gesture) and point cloud information are input into a point cloud cutting module together, the extracted human body two-dimensional (2D) skeleton information is used for cutting a point cloud image, and useless background information is removed.
In the method for recognizing the end-to-end human body gesture based on the combination of RGB and point cloud, in the step 2), a single human body detection model from bottom to top is adopted in the front network, a network pre-trained by a large amount of data is used for detecting two-dimensional (2D) key nodes of a human body, namely, the joint node coordinates of all human bodies in one image are detected first, and then coordinate clustering is carried out, so that key point coordinates corresponding to the human body are formed.
In the method for recognizing the end-to-end human body gesture based on the combination of RGB and the point cloud, in the step 3), the point cloud information and the two-dimensional (2D) human body skeleton information are fused and then are input into a three-dimensional (3D) network at the same time, and the trained three-dimensional (3D) network can extract accurate three-dimensional (3D) human body skeleton information from the point cloud information.
In the method for recognizing the end-to-end human body gesture based on the combination of RGB and point cloud, in the step 3), a convolutional neural network is adopted in a three-dimensional (3D) network, the convolutional neural network is divided into three layers in total, the first two layers of networks are connected into a layer pooling layer, and finally, the three layers of networks are output through a full link layer.
Compared with the prior art, the invention has the beneficial effects that:
1. the method designs a unique mode of merging two networks, merges two types of data of RGB and RGB-D at the same time, adopts a strategy of merging data in the middle, effectively reduces the problems of noise in the early stage of data and overlarge data amount in the early stage of data merging, and provides a novel human body gesture detection method aiming at the RGB-D camera.
2. The model in the identification method can provide effective real 3D gesture information of a human body, and different models in the past can only output 2D skeleton information, and the model can output the 3D skeleton model with real world coordinate values, can have detailed data for the height and each trunk size of the human body, and can realize centimeter-level precision under the condition of higher RGB-D camera precision.
3. The 3D gesture can be estimated only according to the current frame, and compared with the prior process that a plurality of continuous image frames or video frames are needed to estimate the 3D gesture, the method has the advantage that the method is greatly improved.
Drawings
The invention is further illustrated by way of example with reference to the accompanying drawings in which:
fig. 1 is a flow chart of an end-to-end human body gesture recognition method based on the combination of RGB and point cloud of the present invention.
The specific embodiment is as follows:
according to the invention, the gesture recognition of the 3D point cloud is effectively combined with the gesture recognition based on RGB image data, and a front-back network combined deep learning method is provided, namely, an end-to-end human body gesture recognition method based on the combination of RGB and point cloud is provided, and the advantages of the point cloud image and the RGB image are fused.
The end-to-end human body gesture recognition method based on the combination of RGB and point cloud comprises the following main steps:
1. and preprocessing the RGB information and the point cloud information. Before preprocessing, an RGB-D device (RGB-D camera) is used as an acquisition input of signals, and the acquired signals are divided into RGB information and point cloud information. And then, respectively carrying out filtering and denoising pretreatment on the RGB and point cloud information, and carrying out alignment treatment. Specifically, since the RGB information and the point cloud information are collected by different collecting devices, the image information collected by the two devices cannot completely coincide, and a position offset p exists between the two devices. The contour feature mapping method firstly takes a point cloud image as a coordinate reference, extracts the salient features of each edge, maps feature points one by one, calculating the offset { p1 }, of each feature point p2, p3, & lt & gtis & lt- & gt, and finally, calculating the average offset p of all the characteristic points, and projecting RGB to an affine space for conversion alignment. The method has the advantages that the aligned point cloud information and the aligned RGB information are matched in plane space information, and the cutting of the point cloud information and the mutual fusion of the point cloud information and the RGB information are facilitated.
2. And extracting the 2D skeleton information of the human body by using the front-end network. In the step, the preprocessed RGB information is input into a pre-trained pre-network to extract the 2D skeleton information of the human body, and the extracted 2D skeleton information of the human body is used for cutting the point cloud picture to remove useless background information. The invention provides a front-end network for 2D gesture extraction based on a convolutional neural network structure, wherein an activation function used by all network layers is a ReLu function, and output gesture information comprises 25 skeleton key point information, such as 25 data of key points of a nose, a head, a shoulder and the like. The front-end network provided by the invention adopts a single human body detection model from bottom to top, and a network pre-trained by a large amount of data firstly detects 2D key nodes of human bodies, namely, firstly detects the joint node coordinates of all human bodies in one image, and then performs coordinate clustering to form key point coordinates corresponding to the human bodies. The feature extraction network adopts a convolutional neural network VGG-19 network to extract features, then 3 networks of 3 multiplied by 3 are connected to predict confidence intervals of 25 nodes, and human body 2D skeleton information, namely a human body 2D skeleton diagram, is obtained according to the confidence intervals. The RGB information contains rich colors and context associated information, the extracted skeleton precision is high, and the RGB information data acquisition is relatively easy, so that more training data are needed, and the trained model is more accurate.
3. And extracting the 3D skeleton information of the human body by using a 3D network. Specifically, the point cloud information and the human body 2D skeleton information are fused and then input into the 3D network at the same time, and the trained 3D network can extract accurate human body 3D skeleton information from the point cloud information. The invention provides a 3D gesture estimation network of a 3D network, which adopts a 3D kernel convolution mode to construct a convolution neural network, wherein the convolution neural network is divided into three layers in total, the first two layers of networks are connected into a layer pooling layer, and finally, the convolution neural network is output through a full connection layer. In the input process, as shown in fig. 1, one data is human body 2D skeleton information output by a front network, namely 2D gesture, and the other data is cut human body point cloud information, and normalization processing is performed on the point cloud and the 2D gesture information, so that the values of the point cloud and the 2D gesture information are unified to a (-1, 1) interval, and then the 2D gesture information (X, Y) is merged into the 3D point cloud information (X, Y, Z) layer by layer, wherein the weight of the (X, Y) of the 2D skeleton information and the weight of the (X, Y) in the point cloud information are set to be 10:1, extracting a confidence region of the 3D skeleton by utilizing 3 multiplied by 3, and finally outputting 3D skeleton information through a full connection layer. The method has the advantages that the advantages of carrying out advantage complementation on the point cloud information and the RGB information are effectively achieved, the point cloud has good space position information but is sparse, skeleton information cannot be accurately extracted, the RGB information contains abundant information but lacks space position information, and the two information can be effectively fused by utilizing a trained network.
The invention provides a network capable of accurately extracting 3D gesture information of a human skeleton end to end, which is characterized in that training is firstly carried out on a human3.6 data set in the early stage of work, and then fine adjustment is carried out on a real-world human body data set acquired by combining the network. After training, only forward end reasoning is needed in the application process of the invention.
The flow chart of the end-to-end human body gesture recognition method based on the combination of RGB and point cloud of the basic invention is shown in fig. 1, and the specific implementation flow is as follows:
1. firstly, an RGB-D camera is used as acquisition input of signals to acquire RGB-D data;
2. dividing the acquired signals into RGB information and point cloud information;
3. respectively inputting RGB information and point cloud information into a preprocessing module for filtering, denoising and preprocessing, and performing alignment processing;
4. inputting the preprocessed RGB information into a pre-trained pre-network (Pose-net) to extract 2D skeleton information of the human body, namely 2D gesture;
5. inputting the extracted 2D skeleton information (2D gesture) of the human body and the point cloud information into a point cloud cutting module, cutting the point cloud by using the extracted 2D skeleton information of the human body, and eliminating the useless background information;
6. then, the point cloud information and the human body 2D skeleton information (2D gestures) are fused and then are input into a 3D network at the same time, the trained 3D network can extract accurate human body 3D skeleton information from the point cloud information, on one hand, the characteristics of high 2D skeleton extraction accuracy and accurate key point positioning are effectively utilized, and on the other hand, the point cloud information is also utilized to effectively geometrically constrain the finally generated 3D human body gestures;
and 7.3D network outputs the accurate model of the human body 3D framework.
An RGB-D camera is an acquisition device that can acquire both point cloud images and RGB color images. The method adopts an end-to-end deep neural network method, and simultaneously adopts a scheme of mutually fusing RGB images and point cloud images, so that the limitation that the prior gesture recognition is singly dependent on RGB images or point cloud images is overcome, the human gesture extraction scheme which takes common 2D space image features and depth 3D space features into consideration is adopted, the recognition precision is improved, and the angular ambiguity of single-picture gesture recognition is eliminated.
In summary, the invention provides an effective fully supervised deep learning network model, which provides two network-level extraction: one is a pre-network (post-net) for extracting skeletal gestures of a human body, and the other is a 3D network for combining skeletal information with extracted 3D gesture information of a point cloud. The deep learning network model provided by the invention can effectively extract an accurate 3D model of a human body from human body data acquired by RGB-D equipment. Unlike the conventional 3D conversion model, the 3D information herein is 3D data information including the real human body, and the conventional 3D human body converted from 2D is often matched by the model, so that the obtained 3D data is not real, and ambiguity occurs due to the angle of the camera and the distance from the camera. In view of the above, the invention combines 2D-3D model conversion and depth point cloud information to obtain accurate 3D human body skeleton.
The above examples are only specific embodiments of the present invention for illustrating the technical solution of the present invention, but not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present invention is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (6)

1. An end-to-end human body gesture recognition method based on combination of RGB and point cloud is characterized by comprising the following steps:
step 1): preprocessing RGB information and point cloud information;
step 2): extracting human body two-dimensional (2D) skeleton information by using a pre-network, inputting the preprocessed RGB information into the pre-network trained in advance to extract the human body two-dimensional (2D) skeleton information, inputting the 2D gesture and the point cloud information of the extracted human body two-dimensional (2D) skeleton information into a point cloud cutting module, cutting a point cloud image by using the extracted human body two-dimensional (2D) skeleton information, and eliminating useless background information; and
step 3): extracting human body three-dimensional (3D) skeleton information by using a three-dimensional (3D) network, merging the point cloud information and the human body two-dimensional (2D) skeleton information, and then inputting the merged point cloud information and the human body two-dimensional (2D) skeleton information into the three-dimensional (3D) network, wherein the trained three-dimensional (3D) network extracts accurate human body three-dimensional (3D) skeleton information from the point cloud information.
2. The end-to-end human body gesture recognition method based on the combination of RGB and point cloud according to claim 1, in step 1), before the preprocessing, the collected signals are first separated into RGB information and point cloud information by using an RGB-D camera as a collection input of the signals.
3. The end-to-end human body gesture recognition method based on the combination of RGB and point cloud according to claim 1, in step 1), the RGB information and the point cloud information are subjected to filtering and denoising preprocessing, and alignment processing is performed.
4. The end-to-end human body gesture recognition method based on the combination of RGB and point cloud as claimed in claim 1, in step 1), a contour feature mapping method is used to extract each edge salient feature with the point cloud image as a coordinate reference, the feature points are mapped one by one, the offset { p1, p2, p3, & gtof & ltDEG & gtof & lt/DEG & gtof each feature point is calculated, and finally, calculating the average offset p of all the characteristic points, and projecting RGB to an affine space for conversion alignment.
5. The end-to-end human body gesture recognition method based on the combination of RGB and point cloud as claimed in claim 1, in the step 2), the front network adopts a single human body detection model from bottom to top, and a network pre-trained by a large amount of data firstly detects two-dimensional (2D) key nodes of human bodies, namely, firstly detects joint point coordinates of all human bodies in an image, and then performs coordinate clustering to form key point coordinates corresponding to the human bodies.
6. The end-to-end human body gesture recognition method based on the combination of RGB and point cloud according to claim 1, wherein in step 3), the three-dimensional (3D) network adopts a convolutional neural network, the convolutional neural network is divided into three layers in total, wherein the first two layers of networks are both connected into a layer pooling layer, and finally output through a fully connected layer.
CN201910836867.3A 2019-09-05 2019-09-05 End-to-end human body gesture recognition method based on combination of RGB and point cloud Active CN110555412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910836867.3A CN110555412B (en) 2019-09-05 2019-09-05 End-to-end human body gesture recognition method based on combination of RGB and point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910836867.3A CN110555412B (en) 2019-09-05 2019-09-05 End-to-end human body gesture recognition method based on combination of RGB and point cloud

Publications (2)

Publication Number Publication Date
CN110555412A CN110555412A (en) 2019-12-10
CN110555412B true CN110555412B (en) 2023-05-16

Family

ID=68739207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910836867.3A Active CN110555412B (en) 2019-09-05 2019-09-05 End-to-end human body gesture recognition method based on combination of RGB and point cloud

Country Status (1)

Country Link
CN (1) CN110555412B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597974B (en) * 2020-05-14 2023-05-12 哈工大机器人(合肥)国际创新研究院 Monitoring method and system for personnel activities in carriage based on TOF camera
CN111723688B (en) * 2020-06-02 2024-03-12 合肥的卢深视科技有限公司 Human body action recognition result evaluation method and device and electronic equipment
CN111723687A (en) * 2020-06-02 2020-09-29 北京的卢深视科技有限公司 Human body action recognition method and device based on neural network
CN112070835B (en) * 2020-08-21 2024-06-25 达闼机器人股份有限公司 Mechanical arm pose prediction method and device, storage medium and electronic equipment
CN113238650B (en) 2021-04-15 2023-04-07 青岛小鸟看看科技有限公司 Gesture recognition and control method and device and virtual reality equipment
CN112907672B (en) * 2021-05-07 2021-10-08 上海擎朗智能科技有限公司 Robot avoidance method and device, electronic equipment and storage medium
CN114091601B (en) * 2021-11-18 2023-05-05 业成科技(成都)有限公司 Sensor fusion method for detecting personnel condition
TWI789267B (en) * 2022-03-10 2023-01-01 國立臺中科技大學 Method of using two-dimensional image to automatically create ground truth data required for training three-dimensional pointnet
CN114694263B (en) * 2022-05-30 2022-09-02 深圳智华科技发展有限公司 Action recognition method, device, equipment and storage medium
CN115471561A (en) * 2022-11-14 2022-12-13 科大讯飞股份有限公司 Object key point positioning method, cleaning robot control method and related equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086683A (en) * 2018-07-11 2018-12-25 清华大学 A kind of manpower posture homing method and system based on cloud semantically enhancement

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2013106357A (en) * 2013-02-13 2014-08-20 ЭлЭсАй Корпорейшн THREE-DIMENSIONAL TRACKING OF AREA OF INTEREST, BASED ON COMPARISON OF KEY FRAMES
CN104715493B (en) * 2015-03-23 2018-01-19 北京工业大学 A kind of method of movement human Attitude estimation
CN107180226A (en) * 2017-04-28 2017-09-19 华南理工大学 A kind of dynamic gesture identification method based on combination neural net
CN108830150B (en) * 2018-05-07 2019-05-28 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086683A (en) * 2018-07-11 2018-12-25 清华大学 A kind of manpower posture homing method and system based on cloud semantically enhancement

Also Published As

Publication number Publication date
CN110555412A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110555412B (en) End-to-end human body gesture recognition method based on combination of RGB and point cloud
US10109055B2 (en) Multiple hypotheses segmentation-guided 3D object detection and pose estimation
Cui et al. SOF-SLAM: A semantic visual SLAM for dynamic environments
CN107808131B (en) Dynamic gesture recognition method based on dual-channel deep convolutional neural network
Ye et al. Accurate 3d pose estimation from a single depth image
CN109934848B (en) Method for accurately positioning moving object based on deep learning
Martin et al. Real time head model creation and head pose estimation on consumer depth cameras
CN108734194B (en) Virtual reality-oriented single-depth-map-based human body joint point identification method
CN106251399A (en) A kind of outdoor scene three-dimensional rebuilding method based on lsd slam
Kogler et al. Event-based stereo matching approaches for frameless address event stereo data
Medioni et al. Identifying noncooperative subjects at a distance using face images and inferred three-dimensional face models
CN105843386A (en) Virtual fitting system in shopping mall
CN109359514B (en) DeskVR-oriented gesture tracking and recognition combined strategy method
CN110008913A (en) The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism
CN110852182A (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
WO2010135617A1 (en) Gesture recognition systems and related methods
CN111160291A (en) Human eye detection method based on depth information and CNN
CN112379773B (en) Multi-person three-dimensional motion capturing method, storage medium and electronic equipment
CN106815855A (en) Based on the human body motion tracking method that production and discriminate combine
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN111476089B (en) Pedestrian detection method, system and terminal for multi-mode information fusion in image
CN110334607B (en) Video human interaction behavior identification method and system
CN112668550B (en) Double interaction behavior recognition method based on joint point-depth joint attention RGB modal data
CN110751097A (en) Semi-supervised three-dimensional point cloud gesture key point detection method
Li et al. Deep learning based monocular depth prediction: Datasets, methods and applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant